#web scraping com python
Explore tagged Tumblr posts
Text

Explorando o Universo da Análise de Dados com Python! Python é uma das linguagens de programação mais poderosas e versáteis para análise de dados, e existem inúmeras bibliotecas que tornam essa tarefa mais fácil e eficiente. A imagem ilustra algumas das principais categorias e bibliotecas que você pode explorar para diferentes áreas da análise de dados: 🧮 1. Manipulação de Dados Pandas: Trabalhe com grandes conjuntos de dados e realize operações complexas. NumPy: Ideal para cálculos numéricos e manipulação de arrays. Polars, Vaex, CuPy: Ferramentas otimizadas para trabalhar com grandes volumes de dados. 📊 2. Visualização de Dados Matplotlib, Seaborn, Plotly: Crie gráficos interativos e compreensíveis para análise e apresentação. Bokeh, Altair, Folium: Visualize informações em mapas e gráficos customizados. 📈 3. Análise Estatística SciPy, Statsmodels, PyMC3: Conduza análises estatísticas aprofundadas e aplique modelos probabilísticos aos seus dados. 🧠 4. Machine Learning Scikit-learn, TensorFlow, PyTorch: Ferramentas essenciais para aprendizado de máquina e inteligência artificial. XGBoost, Keras, JAX: Para modelos avançados e deep learning. 🗣️ 5. Processamento de Linguagem Natural (NLP) NLTK, spaCy, BERT: Analise texto, faça traduções e execute tarefas complexas de processamento de linguagem. 🌐 6. Web Scraping Beautiful Soup, Selenium: Extraia dados de websites para alimentar suas análises. 📅 7. Análise de Séries Temporais Prophet, Darts, Sktime: Ferramentas avançadas para prever tendências futuras com base em dados históricos. 🗄️ 8. Operações com Banco de Dados Dask, PySpark, Hadoop: Gerencie e processe grandes volumes de dados distribuídos.
0 notes
Text
how to use vpn in python
🔒🌍✨ Ganhe 3 Meses de VPN GRÁTIS - Acesso à Internet Seguro e Privado em Todo o Mundo! Clique Aqui ✨🌍🔒
how to use vpn in python
Configuração VPN em Python
Uma Configuração VPN (Virtual Private Network) em Python é uma maneira eficaz de estabelecer conexões seguras e protegidas através da internet. Ao utilizar a linguagem de programação Python, os desenvolvedores podem criar suas próprias ferramentas para configurar e gerenciar conexões VPN de forma personalizada.
Ao implementar uma VPN em Python, é possível garantir a criptografia dos dados transmitidos, tornando as comunicações mais seguras e protegidas contra possíveis invasões ou interceptações. Além disso, uma VPN pode ser útil para contornar restrições geográficas, permitindo acessar conteúdos que normalmente não estariam disponíveis em determinadas regiões.
Para configurar uma VPN em Python, é necessário utilizar bibliotecas específicas, como por exemplo a 'VPN-Tools', que oferece funcionalidades para criação e gerenciamento de conexões VPN de forma simples e direta. Além disso, é importante atentar-se às configurações de segurança, como a escolha dos protocolos de criptografia mais adequados e a implementação de medidas para proteger a privacidade dos dados.
Em resumo, a configuração de uma VPN em Python pode ser uma excelente alternativa para usuários que buscam aumentar a segurança de suas conexões online, garantindo a privacidade e a confidencialidade das informações transmitidas. Com as ferramentas e conhecimentos adequados, é possível criar e gerenciar uma VPN de forma eficiente e personalizada, atendendo às necessidades específicas de cada projeto ou aplicação.
Bibliotecas VPN para Python
As bibliotecas VPN para Python são ferramentas essenciais para desenvolvedores que desejam adicionar funcionalidades de rede privada virtual (VPN) às suas aplicações em Python. Com essas bibliotecas, é possível criar conexões seguras e criptografadas, garantindo a privacidade e a segurança dos dados transmitidos pela rede.
Uma das bibliotecas mais populares para implementar VPN em Python é a pyOpenSSL. Essa biblioteca oferece suporte robusto para criptografia SSL/TLS, permitindo a criação de conexões seguras em diferentes protocolos. Além disso, a pyOpenSSL é de código aberto e amplamente utilizada pela comunidade de desenvolvimento Python.
Outra biblioteca importante é a pyVpn, que fornece uma interface simples e intuitiva para configurar conexões VPN em Python. Com a pyVpn, os desenvolvedores podem facilmente integrar funcionalidades de VPN em suas aplicações, garantindo a segurança dos dados transmitidos pela internet.
É fundamental ressaltar a importância de utilizar bibliotecas VPN confiáveis e atualizadas, a fim de garantir a segurança e a integridade das conexões de rede. Além disso, é recomendável seguir as melhores práticas de segurança ao implementar funcionalidades de VPN em suas aplicações Python, como autenticação de usuários, criptografia de dados e monitoramento de conexões.
Em resumo, as bibliotecas VPN para Python são recursos essenciais para garantir a segurança e a privacidade das aplicações desenvolvidas em Python. Com essas ferramentas, os desenvolvedores podem criar conexões seguras e criptografadas, protegendo os dados transmitidos pela rede contra possíveis ameaças.
Conexão VPN em Scripts Python
Uma VPN, ou Rede Virtual Privada, é uma ferramenta essencial para proteger a privacidade e a segurança de sua conexão com a internet. No contexto de scripts Python, a conexão VPN pode ser utilizada para diversos fins, como a criação de bots de web scraping, a automação de tarefas online e até mesmo para contornar bloqueios geográficos em sites e serviços.
Existem diversas bibliotecas em Python que permitem a implementação de uma conexão VPN de forma simples e eficaz. Através do uso de pacotes como openvpn-api, tunnelblick, pyopenvpn e vpnc é possível estabelecer conexões seguras e criptografadas com servidores VPN em diferentes partes do mundo.
Ao utilizar uma conexão VPN em scripts Python, é importante ter em mente questões relacionadas à segurança dos dados transmitidos, à escolha de servidores confiáveis e à configuração correta da VPN. Além disso, é fundamental respeitar os termos de uso e as políticas de privacidade das redes VPN utilizadas.
Por fim, a conexão VPN em scripts Python pode ser uma ferramenta poderosa para proteger sua privacidade online, contornar restrições geográficas e garantir a segurança de suas comunicações na internet. Com um conhecimento sólido em programação e segurança da informação, é possível explorar todo o potencial dessa tecnologia e aprimorar suas habilidades como desenvolvedor Python.
Segurança de Rede com VPN em Python
A segurança de rede é uma preocupação importante para garantir a proteção de informações confidenciais e dados sensíveis. Uma maneira eficaz de proteger a comunicação online é através do uso de VPN (Virtual Private Network).
Uma VPN em Python é uma ferramenta poderosa que permite uma conexão segura e criptografada em redes públicas ou privadas. Com a criação de um túnel de comunicação seguro, a VPN em Python possibilita o tráfego de dados de forma protegida, impedindo que terceiros interceptem ou acessem informações sigilosas.
Ao implementar uma VPN em Python, é possível garantir a privacidade e integridade dos dados transmitidos, tornando as comunicações mais seguras e confiáveis. Além disso, a VPN em Python permite acessar recursos de rede de forma remota, mantendo a confidencialidade das informações mesmo em redes Wi-Fi públicas ou não confiáveis.
É importante ressaltar que a segurança de rede com VPN em Python requer conhecimento técnico para configurar e manter a infraestrutura de forma adequada. É fundamental adotar práticas de segurança cibernética para proteger os dados e a privacidade dos usuários, utilizando certificados digitais e chaves de criptografia robustas.
Em resumo, a utilização de VPN em Python é uma medida essencial para reforçar a segurança de rede, protegendo as comunicações online e garantindo a confidencialidade das informações em ambientes virtuais cada vez mais vulneráveis.
Proxy e VPN no Desenvolvimento em Python
Proxy e VPN no Desenvolvimento em Python
Quando se trata de desenvolvimento em Python, a utilização de proxy e VPN pode ser extremamente útil para garantir a segurança e a privacidade dos dados. Proxy e VPN são tecnologias que permitem ocultar o endereço IP real de um dispositivo, garantindo assim a anonimidade e protegendo contra possíveis ciberataques.
No desenvolvimento em Python, a integração de proxy e VPN pode ser feita de diversas formas. Uma opção é utilizar bibliotecas e módulos específicos que facilitam a comunicação segura com servidores remotos. Por exemplo, a biblioteca requests permite configurar proxies facilmente, garantindo uma conexão segura e protegida.
Além disso, a utilização de VPN no desenvolvimento em Python pode ser essencial ao lidar com informações sensíveis ou realizar testes de segurança em aplicações web. Através de uma conexão VPN, é possível criptografar os dados transmitidos, garantindo a confidencialidade das informações e evitando possíveis vazamentos.
Por fim, é importante ressaltar que ao utilizar proxy e VPN no desenvolvimento em Python, é fundamental garantir a conformidade com as leis e regulamentos de proteção de dados. É essencial adotar boas práticas de segurança cibernética e sempre manter as bibliotecas e ferramentas atualizadas para evitar vulnerabilidades.
Em resumo, a integração de proxy e VPN no desenvolvimento em Python pode contribuir significativamente para a segurança e privacidade dos dados. Ao adotar medidas preventivas e utilizar tecnologias de proteção, os desenvolvedores podem garantir a integridade das aplicações e dos sistemas desenvolvidos em Python.
0 notes
Text
How To Scrape Target Store Locations Data From Target.Com Using Python?
https://www.locationscloud.com/how-to-scrape-target-store-locations-from-target-com-using-python/Web data scraping is a quicker and well-organized way of getting details about the store locations or scrape locations from website rather than using time to collect information physically. This tutorial blog is for scraping store locations as well as contact data accessible on Target.com, amongst the biggest discounted store retailers in the USA. For the tutorial blog here, our Target store locator will scrape the information for Target store locations by the provided zip code.
We can scrape the following data fields:
1. Store’s Name 2. Store’s Address 3. Week Days 4. Phone Number 5. Hours Open
Here is the screenshot of data, which will be scraped as a part of the tutorial.
There are lots of data we can extract from a store details page on Target.com like grocery and pharmacy timings however, we’ll continue with these.
Coding
In case, the implant given here doesn’t work, then you may click on a link given here.
https://www.locationscloud.com/how-to-scrape-target-store-locations-from-target-com-using-python/
1 note
·
View note
Link
0 notes
Link
Web Scraping com Python
0 notes
Text
{ "translatorID": "552cdac3-f130-4763-a88e-8e74b92dcb1b", "label": "Tumblr", "creator": "febrezo", "target": "^https?://[^/]+\.tumblr\.com/", "minVersion": "3.0", "maxVersion": "", "priority": 100, "inRepository": true, "translatorType": 4, "browserSupport": "gcsibv", "lastUpdated": "2021-06-01 23:04:10" }
/* Tumblr Translator Copyright (C) 2020 Félix Brezo, [email protected] program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the Affero GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
function detectWeb(doc, url) { if (url.match(/^https?:\/\/www./)) { // only try to translate subdomain blogs return false; } if (url.includes('/post/')) { return "blogPost"; } if (url.includes('/search/') && getSearchResults(doc, true)) { return "multiple"; } return "webpage"; }
function getSearchResults(doc, checkOnly) { var items = {}; var found = false; var rows = doc.querySelectorAll('#posts article'); for (let row of rows) { let href = row.querySelector('a.post-notes').href; let title = ZU.trimInternal(text(row, '.body-text p') || text(row, 'a.tag-link')); if (!href || !title) continue; if (checkOnly) return true; found = true; items[href] = title; } return found ? items : false; }
function doWeb(doc, url) { if (detectWeb(doc, url) == "multiple") { Zotero.selectItems(getSearchResults(doc, false), function (items) { if (items) ZU.processDocuments(Object.keys(items), scrape); }); } else { scrape(doc, url); } }
function scrape(doc, url) { var resourceType = detectWeb(doc, url); // Creating the item var newItem = new Zotero.Item(resourceType);var urlParts = url.split('/'); var tmpDate; if (resourceType == "blogPost") { newItem.blogTitle = ZU.xpathText(doc, "//meta[@property='og:site_name']/@content"); newItem.title = ZU.xpathText(doc, "//meta[@property='og:title']/@content"); tmpDate = ZU.xpathText(doc, '(//div[@class="date-note-wrapper"]/a)[1]'); if (!tmpDate) { tmpDate = ZU.xpathText(doc, '//div[@class="date"]/text()'); } newItem.date = ZU.strToISO(tmpDate); } else { newItem.title = ZU.xpathText(doc, "//title/text()"); newItem.websiteTitle = ZU.xpathText(doc, "//meta[@name='description']/@content"); } var tmpAuthor = urlParts[2].split(".")[0]; if (tmpAuthor) { newItem.creators.push({ lastName: tmpAuthor, creatorType: "author", fieldMode: 1 }); } newItem.websiteType = "Tumblr"; newItem.url = url; // Adding the attachment newItem.attachments.push({ title: "Tumblr Snapshot", mimeType: "text/html", url: url }); newItem.complete();
}
/** BEGIN TEST CASES / var testCases = [ { "type": "web", "url": "https://blogdeprogramacion.tumblr.com/post/167688373297/c%C3%B3mo-integrar-opencv-y-python-en-windows", "items": [ { "itemType": "blogPost", "title": "¿Cómo integrar OpenCV y Python en Windows?", "creators": [ { "lastName": "blogdeprogramacion", "creatorType": "author", "fieldMode": 1 } ], "date": "2017-11-19", "blogTitle": "Blog de Programacion y Tecnologia", "url": "https://blogdeprogramacion.tumblr.com/post/167688373297/c%C3%B3mo-integrar-opencv-y-python-en-windows", "websiteType": "Tumblr", "attachments": [ { "title": "Tumblr Snapshot", "mimeType": "text/html" } ], "tags": [], "notes": [], "seeAlso": [] } ] }, { "type": "web", "url": "https://blogdeprogramacion.tumblr.com/", "items": [ { "itemType": "webpage", "title": "Blog de Programacion y Tecnologia", "creators": [ { "lastName": "blogdeprogramacion", "creatorType": "author", "fieldMode": 1 } ], "url": "https://blogdeprogramacion.tumblr.com/", "websiteTitle": "Blog de programacion, tecnologia, electronica, tutoriales, informatica y sistemas computacionales.", "websiteType": "Tumblr", "attachments": [ { "title": "Tumblr Snapshot", "mimeType": "text/html" } ], "tags": [], "notes": [], "seeAlso": [] } ] }, { "type": "web", "url": "https://montereybayaquarium.tumblr.com/search/turtle", "items": "multiple" } ] / END TEST CASES **/
0 notes
Text
Just How To Web Scrape Amazon Com
The fetchShelves() feature will just return the item's title at the moment, so let's obtain the remainder of the information we need. Please add the complying with lines of code after the line where we specified the variable title. Now, you could wish to scrape a number of web pages well worth of data for this project. So far, we are only scratching web page 1 of the search results. Allow's configuration ParseHub to navigate to the next 10 outcomes pages.
What can data scraping be made use of for?
youtube
Internet scraping APIs-- The most convenient choice presents a neat icon. All you require to do is point-and-click what you want to scuff. Take part in among our FREE live on the internet data analytics occasions with industry specialists, and also review Azadeh's journey from college educator to information analyst. Obtain a hands-on introduction to information analytics and execute your first analysis with our cost-free, self-paced Information Analytics Short Program.
Scraping Amazoncom: Faq
Using the locate() function readily available for looking particular tags with specific characteristics we locate the Tag Things containing title of the product. With the help of the link, we will certainly send out the request to the page for accessing its information. Python - The ease of use and also a substantial collection of libraries make Python the numero-uno for scratching web sites. Nevertheless, if the user does not have it pre-installed, refer below. OurPython Scrapy Consulting Servicehas aided a companies in selecting server, proxy, IPs, ideas to data maintenance.
Ensure your fingerprint specifications correspond, or choose Internet Unblocker-- an AI-powered proxy service with dynamic fingerprinting performance.
BeautifulSoup is another Python library, commonly made use of to parse information from XML as well as HTML records.
If you do not have Python 3.8 or over installed, head to python.org and also download as well as install Python.
The given study shows just how Actowiz has actually aided an FMCG business in optimizing its getting processes by extracting competitors' team data.
Gather real-time flight and resort data to as well as build a solid strategy for your travel business.
We currently discussed that internet scuffing isn't constantly as straightforward as complying with a step-by-step process. Here's a list of extra things to think about prior to scuffing a web site. BeautifulSoup is one more Python library, generally made use of to parse information from XML and also HTML records.
Brilliant Information
Organizing this analyzed content right into more obtainable trees, BeautifulSoup makes navigating as well as undergoing large swathes of data much easier. Web scuffing is a technique utilized to gather material and also information. from the internet. This information is normally saved in a regional file to make sure that it https://www.tumblr.com/bluepeachbasement can be adjusted and also evaluated as required. If you have actually ever before copied as well as pasted material from a website right into an Excel spread sheet, this is basically what web scraping is, yet on an extremely tiny range. The quantity of data in our lives is growing significantly. With this rise, information analytics has actually become an extremely vital part of the way companies are run.
Obtain the totally free guide that will certainly show you specifically how to use proxies to avoid blocks, bans, as well as captchas in your company. Rate needs to be practical as well as be at a currency exchange rate that shows the value of the whole proxy plan. The excellent proxy package consists of a sophisticated individual control panel that makes your task simple and easy. Trustworthy proxies maintain your information secure as well as allow you to browse the web without interruption. Additional info CareerFoundry is an online school for individuals seeking to switch to a rewarding occupation in tech.
Location-based Information Scuffing For
Web scrapes throughout the globe gather tons of info for either personal or professional usage. In addition, present-day technology titans depend on such internet scuffing approaches to meet the demands of their consumer base. Yes, scraping can be identified by the anti-bot software program that can check your IP address, internet browser criteria, individual representatives, as well as various other details. After being detected, the site will toss CAPTCHA, and also if not fixed, your IP will get blocked. Demands is a prominent third-party Python collection for making HTTP demands. It offers an easy as well as user-friendly user interface to make HTTP requests to web servers and get responses.
An Insider on Accelerating Open Banking for Real-Time Payments - PYMNTS.com
An Insider on Accelerating Open Banking for Real-Time Payments.
Posted: Mon, 22 May 2023 12:15:09 GMT [source]
All you need to do is pick among the data points, and every other one that has the same pattern is mosting likely to be highlighted. As you possibly already expected, their starter plan does have some limitations, yet the good news is that you can download them onto your desktop. We can scuff up to 1 million information points every hour along with efficient in a lot more. When you crawl the massive amount of data, you need to store the data somewhere. As a result, getting the data source to conserve in addition to accessibility the data is needed.
0 notes
Text
Future Programming Languages 2025 2030
Which is best Future Programming Languages 2025 2030. When programmers are about to start their coding journey, it is difficult to decide on where to start. Here is a list of the future technology programming languages having a high demand in 2025 and 2030.
What are the Future Programming Languages 2025 2030 technology having a high demand in 2025 and 2030
Swift
If you are a mobile developer, Swift is perfect for you! Apple developed it for creating IOS and Mac OS Applications. it remains one of the most in-demand languages of 2021 and will continue to have a high demand in 2025 and also 2030. Swift is also easy to learn and supports almost everything from objective-C. It is a general-purpose, multi-program compiled programming language. It's Mac-base and if you become good with it, then it's easier to make more money than Android developers. Swift is fast, efficient, secure, enables a high level of interactivity by combining forefront language features. It is a general-purpose programming language built using a modern approach to safety performance and software design patterns. The goal of the swift project is to create the best language for users ranging from systems programming to mobile and also desktop apps scaling up to cloud services. Companies using Swift- Apple, Lyft, Uber. Python
Python is undoubtedly a Powerhouse. Its applications extend in many domains like web development, data science, data visualisation, machine learning, artificial intelligence web scraping and also others. It is one of the most popular languages and it is very easy to learn with a vast community and many open source projects. The drawbacks are mainly its slow interpretation since it is a high level language. Python is on top of the job demands and also it has the highest average wages in the tech industry. It is easy to learn. This programming language is great for beginners. It is often use as a scripting language for web applications. Python is the lingua franca of machine learning and also data science. Python's popularity Rose by 3.48% which is very impressive. In Python, coding are the dynamic type. In coding, you don't need to declare the type of variable. The syntax of python is easy to remember, almost similar to human language. Companies using Python- Instagram, Amazon, Facebook and Spotify. Java
Firstly, Java is the leading enterprise programming language at the moment. Java will also be high in demand in 2025 and 2030. It is a general-purpose language use for web pages, and much more and also is the Android dominant language, and it is powerful. It supports distributed computing and multi-threading. And also It is very secure, and it Moves the biggest Enterprises and data centers globally. Today 15 billion devices run Java, and it is being use by 10 million developers worldwide. It is freely accessible and we can run it on all the platforms of the operating systems. Java is best for embedded and also cross-platform applications. Java has a larger number of frameworks and has long lines of code. It is use to develop desktop and mobile applications, big data processing, embedded systems, and so on. Companies using Java- Uber, Netflix, Instagram, Google Kotlin
The effortless interoperation between Java and Kotlin Android development is faster and also enjoyable. Scotland addresses the major issues that surfaced in Java, developers have rewritten several Java apps in Kotlin. The syntax is easy to learn for beginners and also it offers a host of powerful features. It can be a great language to upskill for experienced programmers. It has a Shallow learning curve especially if you have experience in Python or Java. Kotlin is a cross-platform, statically typed, general-purpose programming language with Type inference. It is develop to inter-operate completely with Java. Recently, Google announced that Android development will be increasingly Kotlin- first and that many top apps have already migrated to Kotlin. Companies using Kotlin- Courser, Uber, Pinterest. JavaScript

It is the most popular language according to a Stack overflow survey. It is widely know for adding interactive elements to web applications and also browsers. JavaScript is the ultimate language of the web. Almost every web and also mobile application run JavaScript. Since it is a client-side language, many simple applications don't need server support and in the case of complex applications, it produces a server load. There is an insane growth in the usage of this language as well. And also It is also the foundation of most libraries and frameworks for the web surcharge React, Vue and Node. It can run inside nearly all modern web browsers. It is a programming language used primarily by web browsers to create a dynamic and also interactive experience for the users. Companies using JavaScript- PayPal, Google, Microsoft Rust
Rest is a multi-paradigm programming language focused on performance and safety. Rust is syntactically similar to C++. It offers the safety of memory with no use of garbage collection. Rust has great documentation. A friendly compiler with useful error messages and top-notch tooling- an integrated package manager and also build tool. Rust is the language of the future. And also It is the most loved language and one of the highest paying languages in the world. It empowers everyone to build reliable and efficient software. It has the speed and also low-level access of languages like C/C++ with memory security like modern languages. This programming language can run on embedded devices. Rust can easily integrate with other languages. Hundreds of companies Around The World are using rust in production today for fast, low-resource, cross-platform solutions. Companies using Rust- Dropbox, Figma, Discord C++
Firslty, It is a powerful general-purpose programming language. It can develop operating systems, browsers, games, and so on. C++ supports different ways of programming like procedural, object-oriented, functional, and so on. This makes C++ powerful as well as flexible. C++ is old but gold. It is highly use for professional software game development and also high-performance applications. This includes machine learning. It gives programmers a high level of control over the system's resources and memory. We can find this language in today's operating system, graphical user interface, and also embedded systems. It is close to C# and Java; it makes it easy for programmers to switch to C++ or vice versa. And also was develop as an enhancement of the C language to include an object-orient paradigm. Companies using C++ - Evernote, Microsoft, Opera, Facebook PHP PHP is a popular general-purpose scripting language that is especially suit for web development. Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world. Statistics show that 80% of the top 10 million websites. It creates, reads, opens, deletes, and also closes files on the server. It controls user access and also encrypts data. A wonderful benefit of using PHP is that it can interact with many database languages including my SQL. PHP is free to download and use. And also It is powerful enough to be at the core of the biggest blogging system on the web- WordPress! It is compatible with almost all servers use today like Apache, IIS, and others. It is deep enough to run the largest social network- Facebook. PHP can be easily embed in HTML files and HTML code can also be write in a PHP file. Companies using PHP- Facebook, Tumblr, Etsy, WordPress C#
C-Sharp is a programming language developed by Microsoft. It runs on the .NET framework. It is use to develop web apps, desktop apps, games, and also much more. Microsoft developed C Sharp as a rival to Java. It is highly use in the enterprise environment and also for game development with the Unity engine. C# gives its free hand to create applications not only for Websites but also for mobile applications. Although it has common points with structure programming languages, it is accept as an object-oriented programming language. There are a massive number of out-of-the-box solutions that you can find in this Programming language but not in other programming languages. For example, tools for unit testing, crypto library, Marvellous collections handling and multi-threading. Companies using C#- CarMax, RTX, Twitch Scala
Scala is a programming language that combines Object-oriented programming with functional programming. And also It has a strong static type system and is design to be concise. It operates on the JVM. Also, It is a hybrid of two Programming Paradigms. It tries to address all the criticisms of Java, in which you can keep all the Java libraries and all the advantages of the JVM. At the same time, your code is more concise. Scala is oftentimes use in data science. Scala is a very compatible language and can be very easily install into windows and the Unix operating system easily. This language is useful for developers to enhance their business applications to be more productive, scalable, and reliable. There is no concept of primitive data as everything is an object in Scala. It is design to express the general programming patterns in a refine, succinct, and type-safety way. Companies Using Scala- Netflix, Sony, Twitter, Linkedin
5 Best Practices for Writing Better Code
Naming conventions In computer programming, a naming convention is a set of rules for choosing the character sequence to be use for identifiers that denote variables, types, functions, and other entities in source code and documentation. Three Types of naming conventions are: - Camel case - Pascal case - UnderScores Commenting In computer programming, a comment is a programmer-readable explanation or annotation in the source code for a computer program. We all think our code makes sense, especially if it works but someone else might not to combat this, we all need to get better at source code commenting. Indentation There are no criteria of following any indentation. The best method is a consistent style. Once you start competing in large projects you will immediately understand the importance of consistent code styling. Follow DRY principle DRY- Don't Repeat Yourself It should not repeat the same piece of code over and over again. How to achieve DRY? To avoid violating this principle, break your system into pieces. Dissect your code and logic. Break them into smaller reusable units. Don't write lengthy methods. Try to divide the logic and use the existing peace in your method. Follow KIS principle KIS- Keep It Simple After all, programming languages are for humans to understand, computers can only understand 0 and 1. So, keep coding simple and straightforward. How to achieve KIS? To avoid violating this principle, Try to write simple code. Think of many solutions for your problem then choose the best simplest one and transform that into your code. Whenever programmers face lengthy code, convert it into multiple methods, right-click and reactor in the editor. Try to write small blocks of code that do a single task. Recent Articles: Future Programming Languages 2025 2030 Benefits of Using Angular for Web Development 2021 Difference Between C vs C++ vs Python vs Java KALI Linux Not Prefer Software Development Ubuntu Angular 12 Performance Benchmark Install Features Setup Angular 12 vs 11 vs 10 features benchmark How to Write Business Proposal for Client with Sample Format Top 10 Best Coolest Movies Chris Hemsworth of all time Future Programming Languages 2025 2030 - Writer Taniya Patyal Read the full article
#FutureProgrammingLanguages#FutureProgrammingLanguages2025#FutureProgrammingLanguages20252030#FutureProgrammingLanguages2030
0 notes
Text
What are the reasons to learn python?
Are you an aspiring candidate and looking for a bright future in the IT field? If yes, go ahead to learn python, it is a programming language that is most suitable for web programmers and application developers. Its flexibility, versatility, and object-oriented features help it to integrate with the various programming languages. Among developers, software engineers, data scientists and hackers it is one of the most famous programming languages. The digitizing world creates an excellent opportunity for aspirants. Before trying the higher versions of programming it is mandatory for programmers to learn Python so as to learn the basics of the computer world. In recent times most of the aspirants are flocking to learn python and acquiring its programming skills. As there is a high demand for python developers, python certification empowers your career with innovative technology.
Python is simpler:
Python is simple to learn due to its simplicity python has become an excellent choice for beginners. Its syntax is simple and the high readability factor makes it a beginner-friendly language. The python learning curve is shorter than any other language (Java, C, C++, etc.). Without worrying about the documentation Python lets you head straight to your research part. This is why the Python is used widely in both Data Science and development fields for data analysis, web development, text processing, and statistical analysis, among other things.
Python is
Free and open source
High-level
Interpreted
Blessed with a large community
Python is flexible and extensible:
Python is highly extensible; its flexibility allows you to perform cross-language operations without any hassle. You not only integrate it with Java and .NET components but also to invoke C/C++ libraries. Python is supported by all modern platforms like Windows, Linux, Macintosh, Solaris, etc.
Python library caters to your need:
Python can boast many useful libraries, its choicest assortment of libraries that come in handy for development and Data Science tasks. It has NumPy, Matplotlib, Scipy, Scikit-Learn, Pandas, StatsModels and so much more. Python’s functionalities and capabilities have significantly multiplied over the years, thanks to the vast collection and inclusion of libraries.
The earliest Python library is NumPy that incorporates high-level mathematical functions and multi-dimensional matrices and arrays. For scientific computing, it is a perfect choice. The scientific equivalent of NumPy is SciPy, it is equipped with everything you need for numerical integration and analysis of scientific data. Another popular Python library is Pandas that was built on top of NumPy. Primarily it is used for data analysis.
Python makes web development a breeze:
Python makes the web development process easy; this is the reason why you need to learn Python. There are a wide variety of web development frameworks in Python such as Django, Flask, Pyramid, Web2Py, Bottle, Falcon, Sanic, TurboGears, and FastAPI. These frameworks help developers to write stable code much faster. You can reduce the development time by automating the implementation process. Python certification course helps you to concentrate on more critical elements like application logic. Apart from this, web scraping tasks are performed by Python frameworks.
Data visualization:
Python has something for every need where it packs plenty of data visualization options. In Python, some of the most popular data visualization tools are Matplotlib, Pygal, Plotly, Altair, Seaborn, Bokeh, Geoplotlib, Gleam, and Missingno. You can easily make sense of complex datasets with these data visualization frameworks. Your findings can be visualized through various representation options like graphs, graphical plots, pie charts, web-ready interactive plots and much more.
Artificial intelligence:
In the tech world, AI is the one which is going to revolutionize the development. Actually, you can make a machine that mimics the human brain that has the ability to think, analyze and make decisions.
Pythons consist of numerous testing frameworks:
When it comes to validation ideas or testing products, Python is the way to go where it consists of several built-in testing frameworks that help to debug and speed up workflows. Python certification online helps you to understand the multiple testing frameworks to validate the products. It supports both cross-browser and cross-platform testing with frameworks like PyTest and Robot. There are other testing frameworks like Behave, UnitTest, and Lettuce.
Python is best for Enterprise Application Integration (EAI)
Python is an excellent choice for Enterprise application integration, seamlessly it is embedded and applies to applications written in other languages. Python not only invokes CORBA/COM components but also directly calls from and to C, C++, Java code. Python strongly integrates with C, C++, and Java which makes it perfect for application scripting. Python’s integration capabilities and text processing are highly commendable, as well it can be used for developing GUI and desktop applications.
Python is great for scripting:
Python is also a scripting tool; its set of features doesn’t require any compilation. You can write code in the form of scripts and can directly execute it in Python. Python is the best programming and scripting language where the machine will read and interpret your code and perform error checking during runtime.
Python is backed by an active community:
Python has a dynamic community if you have coding-related or Data Sciences issues you can rely on the Python community. They are ready to help people. The community is growing day by day that enriches the language by developing new tools and libraries where developers and coders actively contribute to the community.
Python skills can command high salaries:
If you have Python skills you can expect high salaries in the industry. At present, Python rules the development and Data Science field where it promises a high growth graph with huge salary prospects. There is a demand for Python developers and this fastest-growing programming language empowers the global job market. By learning python you can accelerate your career growth by getting an impressive pay scale.
Wrapping it up:
In the IT industry, Python has emerged as the number one programming language and lays its foundation for the future. Among others, it is an extremely powerful language and a career option for the developer. Python is easy to use for emerging technologies. By knowing these reasons enrol in a python programming course as per your convenience and learn through certified industry experts. You can get the best career opportunity and real-time knowledge by learning Python. So you can efficiently contribute to the technological development industry.
0 notes
Photo
Python na Prática fazendo Web Scraping (de JavaScript dinâmico) // Mão no Código #28 http://ehelpdesk.tk/wp-content/uploads/2020/02/logo-header.png [ad_1] HOSPEDAGEM COM DESCONTO → https:... #androiddevelopment #angular #beaultifulsoup #bot #c #cdf #cdftv #coder #códigofonte #coletadedados #coletadedadosweb #conversão #css #dataanalysis #datascience #deeplearning #desenvolvedor #developer #development #docker #engenhariadedados #estruturadedados #firefox #gabrielfróes #geckodriver #geek #informatica #iosdevelopment #java #javascript #json #linguagemdeprogramação #machinelearning #nba #nerd #node.js #pandas #programador #python #react #ti #unity #vanessaweber #webdeveloper #webdevelopment #webscraping #youtubebrasil #youtuber
0 notes
Text
SEO Analytics for Free - Combining Google Search with the Moz API
Posted by Purple-Toolz
I’m a self-funded start-up business owner. As such, I want to get as much as I can for free before convincing our finance director to spend our hard-earned bootstrapping funds. I’m also an analyst with a background in data and computer science, so a bit of a geek by any definition.
What I try to do, with my SEO analyst hat on, is hunt down great sources of free data and wrangle it into something insightful. Why? Because there’s no value in basing client advice on conjecture. It’s far better to combine quality data with good analysis and help our clients better understand what’s important for them to focus on.
In this article, I will tell you how to get started using a few free resources and illustrate how to pull together unique analytics that provide useful insights for your blog articles if you’re a writer, your agency if you’re an SEO, or your website if you’re a client or owner doing SEO yourself.
The scenario I’m going to use is that I want analyze some SEO attributes (e.g. backlinks, Page Authority etc.) and look at their effect on Google ranking. I want to answer questions like “Do backlinks really matter in getting to Page 1 of SERPs?” and “What kind of Page Authority score do I really need to be in the top 10 results?” To do this, I will need to combine data from a number of Google searches with data on each result that has the SEO attributes in that I want to measure.
Let’s get started and work through how to combine the following tasks to achieve this, which can all be setup for free:
Querying with Google Custom Search Engine
Using the free Moz API account
Harvesting data with PHP and MySQL
Analyzing data with SQL and R
Querying with Google Custom Search Engine
We first need to query Google and get some results stored. To stay on the right side of Google’s terms of service, we’ll not be scraping Google.com directly but will instead use Google’s Custom Search feature. Google’s Custom Search is designed mainly to let website owners provide a Google like search widget on their website. However, there is also a REST based Google Search API that is free and lets you query Google and retrieve results in the popular JSON format. There are quota limits but these can be configured and extended to provide a good sample of data to work with.
When configured correctly to search the entire web, you can send queries to your Custom Search Engine, in our case using PHP, and treat them like Google responses, albeit with some caveats. The main limitations of using a Custom Search Engine are: (i) it doesn’t use some Google Web Search features such as personalized results and; (ii) it may have a subset of results from the Google index if you include more than ten sites.
Notwithstanding these limitations, there are many search options that can be passed to the Custom Search Engine to proxy what you might expect Google.com to return. In our scenario, we passed the following when making a call:
https://www.googleapis.com/customsearch/v1?key=<google_api_id>&userIp= <ip_address>&cx<custom_search_engine_id>&q=iPhone+X&cr=countryUS&start= 1</custom_search_engine_id></ip_address></google_api_id>
Where:
https://www.googleapis.com/customsearch/v1 – is the URL for the Google Custom Search API
key=<GOOGLE_API_ID> – Your Google Developer API Key
userIp=<IP_ADDRESS> – The IP address of the local machine making the call
cx=<CUSTOM_SEARCH_ENGINE_ID> – Your Google Custom Search Engine ID
q=iPhone+X – The Google query string (‘+’ replaces ‘ ‘)
cr=countryUS – Country restriction (from Goolge’s Country Collection Name list)
start=1 – The index of the first result to return – e.g. SERP page 1. Successive calls would increment this to get pages 2–5.
Google has said that the Google Custom Search engine differs from Google .com, but in my limited prod testing comparing results between the two, I was encouraged by the similarities and so continued with the analysis. That said, keep in mind that the data and results below come from Google Custom Search (using ‘whole web’ queries), not Google.com.
Using the free Moz API account
Moz provide an Application Programming Interface (API). To use it you will need to register for a Mozscape API key, which is free but limited to 2,500 rows per month and one query every ten seconds. Current paid plans give you increased quotas and start at $250/month. Having a free account and API key, you can then query the Links API and analyze the following metrics:
Moz data field
Moz API code
Description
ueid
32
The number of external equity links to the URL
uid
2048
The number of links (external, equity or nonequity or not,) to the URL
umrp**
16384
The MozRank of the URL, as a normalized 10-point score
umrr**
16384
The MozRank of the URL, as a raw score
fmrp**
32768
The MozRank of the URL's subdomain, as a normalized 10-point score
fmrr**
32768
The MozRank of the URL's subdomain, as a raw score
us
536870912
The HTTP status code recorded for this URL, if available
upa
34359738368
A normalized 100-point score representing the likelihood of a page to rank well in search engine results
pda
68719476736
A normalized 100-point score representing the likelihood of a domain to rank well in search engine results
NOTE: Since this analysis was captured, Moz documented that they have deprecated these fields. However, in testing this (15-06-2019), the fields were still present.
Moz API Codes are added together before calling the Links API with something that looks like the following:
www.apple.com%2F?Cols=103616137253&AccessID=MOZ_ACCESS_ID& Expires=1560586149&Signature=<MOZ_SECRET_KEY>
Where:
https://ift.tt/1bbWaai" class="redactor-autoparser-object">https://ift.tt/2oVcks4... – Is the URL for the Moz API
http%3A%2F%2Fwww.apple.com%2F – An encoded URL that we want to get data on
Cols=103616137253 – The sum of the Moz API codes from the table above
AccessID=MOZ_ACCESS_ID – An encoded version of the Moz Access ID (found in your API account)
Expires=1560586149 – A timeout for the query - set a few minutes into the future
Signature=<MOZ_SECRET_KEY> – An encoded version of the Moz Access ID (found in your API account)
Moz will return with something like the following JSON:
Array ( [ut] => Apple [uu] => <a href="http://www.apple.com/" class="redactor-autoparser-object">www.apple.com/</a> [ueid] => 13078035 [uid] => 14632963 [uu] => www.apple.com/ [ueid] => 13078035 [uid] => 14632963 [umrp] => 9 [umrr] => 0.8999999762 [fmrp] => 2.602215052 [fmrr] => 0.2602215111 [us] => 200 [upa] => 90 [pda] => 100 )
For a great starting point on querying Moz with PHP, Perl, Python, Ruby and Javascript, see this repository on Github. I chose to use PHP.
Harvesting data with PHP and MySQL
Now we have a Google Custom Search Engine and our Moz API, we’re almost ready to capture data. Google and Moz respond to requests via the JSON format and so can be queried by many popular programming languages. In addition to my chosen language, PHP, I wrote the results of both Google and Moz to a database and chose MySQL Community Edition for this. Other databases could be also used, e.g. Postgres, Oracle, Microsoft SQL Server etc. Doing so enables persistence of the data and ad-hoc analysis using SQL (Structured Query Language) as well as other languages (like R, which I will go over later). After creating database tables to hold the Google search results (with fields for rank, URL etc.) and a table to hold Moz data fields (ueid, upa, uda etc.), we’re ready to design our data harvesting plan.
Google provide a generous quota with the Custom Search Engine (up to 100M queries per day with the same Google developer console key) but the Moz free API is limited to 2,500. Though for Moz, paid for options provide between 120k and 40M rows per month depending on plans and range in cost from $250–$10,000/month. Therefore, as I’m just exploring the free option, I designed my code to harvest 125 Google queries over 2 pages of SERPs (10 results per page) allowing me to stay within the Moz 2,500 row quota. As for which searches to fire at Google, there are numerous resources to use from. I chose to use Mondovo as they provide numerous lists by category and up to 500 words per list which is ample for the experiment.
I also rolled in a few PHP helper classes alongside my own code for database I/O and HTTP.
In summary, the main PHP building blocks and sources used were:
Google Custom Search Engine – Ash Kiswany wrote an excellent article using Jacob Fogg’s PHP interface for Google Custom Search;
Mozscape API – As mentioned, this PHP implementation for accessing Moz on Github was a good starting point;
Website crawler and HTTP – At Purple Toolz, we have our own crawler called PurpleToolzBot which uses Curl for HTTP and this Simple HTML DOM Parser;
Database I/O – PHP has excellent support for MySQL which I wrapped into classes from these tutorials.
One factor to be aware of is the 10 second interval between Moz API calls. This is to prevent Moz being overloaded by free API users. To handle this in software, I wrote a "query throttler" which blocked access to the Moz API between successive calls within a timeframe. However, whilst working perfectly it meant that calling Moz 2,500 times in succession took just under 7 hours to complete.
Analyzing data with SQL and R
Data harvested. Now the fun begins!
It’s time to have a look at what we’ve got. This is sometimes called data wrangling. I use a free statistical programming language called R along with a development environment (editor) called R Studio. There are other languages such as Stata and more graphical data science tools like Tableau, but these cost and the finance director at Purple Toolz isn’t someone to cross!
I have been using R for a number of years because it’s open source and it has many third-party libraries, making it extremely versatile and appropriate for this kind of work.
Let’s roll up our sleeves.
I now have a couple of database tables with the results of my 125 search term queries across 2 pages of SERPS (i.e. 20 ranked URLs per search term). Two database tables hold the Google results and another table holds the Moz data results. To access these, we’ll need to do a database INNER JOIN which we can easily accomplish by using the RMySQL package with R. This is loaded by typing "install.packages('RMySQL')" into R’s console and including the line "library(RMySQL)" at the top of our R script.
We can then do the following to connect and get the data into an R data frame variable called "theResults."
library(RMySQL) # INNER JOIN the two tables theQuery <- " SELECT A.*, B.*, C.* FROM ( SELECT cseq_search_id FROM cse_query ) A -- Custom Search Query INNER JOIN ( SELECT cser_cseq_id, cser_rank, cser_url FROM cse_results ) B -- Custom Search Results ON A.cseq_search_id = B.cser_cseq_id INNER JOIN ( SELECT * FROM moz ) C -- Moz Data Fields ON B.cser_url = C.moz_url ; " # [1] Connect to the database # Replace USER_NAME with your database username # Replace PASSWORD with your database password # Replace MY_DB with your database name theConn <- dbConnect(dbDriver("MySQL"), user = "USER_NAME", password = "PASSWORD", dbname = "MY_DB") # [2] Query the database and hold the results theResults <- dbGetQuery(theConn, theQuery) # [3] Disconnect from the database dbDisconnect(theConn)
NOTE: I have two tables to hold the Google Custom Search Engine data. One holds data on the Google query (cse_query) and one holds results (cse_results).
We can now use R’s full range of statistical functions to begin wrangling.
Let’s start with some summaries to get a feel for the data. The process I go through is basically the same for each of the fields, so let’s illustrate and use Moz’s ‘UEID’ field (the number of external equity links to a URL). By typing the following into R I get the this:
> summary(theResults$moz_ueid) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 1 20 14709 182 2755274 > quantile(theResults$moz_ueid, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 0.0 0.0 0.0 1.0 20.0 182.0 337.2 1715.2 7873.4 412283.4 2755274.0
Looking at this, you can see that the data is skewed (a lot) by the relationship of the median to the mean, which is being pulled by values in the upper quartile range (values beyond 75% of the observations). We can however, plot this as a box and whisker plot in R where each X value is the distribution of UEIDs by rank from Google Custom Search position 1-20.
Note we are using a log scale on the y-axis so that we can display the full range of values as they vary a lot!
A box and whisker plot in R of Moz’s UEID by Google rank (note: log scale)
Box and whisker plots are great as they show a lot of information in them (see the geom_boxplot function in R). The purple boxed area represents the Inter-Quartile Range (IQR) which are the values between 25% and 75% of observations. The horizontal line in each ‘box’ represents the median value (the one in the middle when ordered), whilst the lines extending from the box (called the ‘whiskers’) represent 1.5x IQR. Dots outside the whiskers are called ‘outliers’ and show where the extents of each rank’s set of observations are. Despite the log scale, we can see a noticeable pull-up from rank #10 to rank #1 in median values, indicating that the number of equity links might be a Google ranking factor. Let’s explore this further with density plots.
Density plots are a lot like distributions (histograms) but show smooth lines rather than bars for the data. Much like a histogram, a density plot’s peak shows where the data values are concentrated and can help when comparing two distributions. In the density plot below, I have split the data into two categories: (i) results that appeared on Page 1 of SERPs ranked 1-10 are in pink and; (ii) results that appeared on SERP Page 2 are in blue. I have also plotted the medians of both distributions to help illustrate the difference in results between Page 1 and Page 2.
The inference from these two density plots is that Page 1 SERP results had more external equity backlinks (UEIDs) on than Page 2 results. You can also see the median values for these two categories below which clearly shows how the value for Page 1 (38) is far greater than Page 2 (11). So we now have some numbers to base our SEO strategy for backlinks on.
# Create a factor in R according to which SERP page a result (cser_rank) is on > theResults$rankBin <- paste("Page", ceiling(theResults$cser_rank / 10)) > theResults$rankBin <- factor(theResults$rankBin) # Now report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_ueid, theResults$rankBin, median) Page 1 Page 2 38 11
From this, we can deduce that equity backlinks (UEID) matter and if I were advising a client based on this data, I would say they should be looking to get over 38 equity-based backlinks to help them get to Page 1 of SERPs. Of course, this is a limited sample and more research, a bigger sample and other ranking factors would need to be considered, but you get the idea.
Now let’s investigate another metric that has less of a range on it than UEID and look at Moz’s UPA measure, which is the likelihood that a page will rank well in search engine results.
> summary(theResults$moz_upa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 33.00 41.00 41.22 50.00 81.00 > quantile(theResults$moz_upa, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 12 20 25 33 41 50 53 58 62 75 81
UPA is a number given to a URL and ranges between 0–100. The data is better behaved than the previous UEID unbounded variable having its mean and median close together making for a more ‘normal’ distribution as we can see below by plotting a histogram in R.
A histogram of Moz’s UPA score
We’ll do the same Page 1 : Page 2 split and density plot that we did before and look at the UPA score distributions when we divide the UPA data into two groups.
# Report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_upa, theResults$rankBin, median) Page 1 Page 2 43 39
In summary, two very different distributions from two Moz API variables. But both showed differences in their scores between SERP pages and provide you with tangible values (medians) to work with and ultimately advise clients on or apply to your own SEO.
Of course, this is just a small sample and shouldn’t be taken literally. But with free resources from both Google and Moz, you can now see how you can begin to develop analytical capabilities of your own to base your assumptions on rather than accepting the norm. SEO ranking factors change all the time and having your own analytical tools to conduct your own tests and experiments on will help give you credibility and perhaps even a unique insight on something hitherto unknown.
Google provide you with a healthy free quota to obtain search results from. If you need more than the 2,500 rows/month Moz provide for free there are numerous paid-for plans you can purchase. MySQL is a free download and R is also a free package for statistical analysis (and much more).
Go explore!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
via Blogger https://ift.tt/31JQAg8 #blogger #bloggingtips #bloggerlife #bloggersgetsocial #ontheblog #writersofinstagram #writingprompt #instapoetry #writerscommunity #writersofig #writersblock #writerlife #writtenword #instawriters #spilledink #wordgasm #creativewriting #poetsofinstagram #blackoutpoetry #poetsofig
0 notes
Text
SEO Analytics for Free - Combining Google Search with the Moz API
Posted by Purple-Toolz
I’m a self-funded start-up business owner. As such, I want to get as much as I can for free before convincing our finance director to spend our hard-earned bootstrapping funds. I’m also an analyst with a background in data and computer science, so a bit of a geek by any definition.
What I try to do, with my SEO analyst hat on, is hunt down great sources of free data and wrangle it into something insightful. Why? Because there’s no value in basing client advice on conjecture. It’s far better to combine quality data with good analysis and help our clients better understand what’s important for them to focus on.
In this article, I will tell you how to get started using a few free resources and illustrate how to pull together unique analytics that provide useful insights for your blog articles if you’re a writer, your agency if you’re an SEO, or your website if you’re a client or owner doing SEO yourself.
The scenario I’m going to use is that I want analyze some SEO attributes (e.g. backlinks, Page Authority etc.) and look at their effect on Google ranking. I want to answer questions like “Do backlinks really matter in getting to Page 1 of SERPs?” and “What kind of Page Authority score do I really need to be in the top 10 results?” To do this, I will need to combine data from a number of Google searches with data on each result that has the SEO attributes in that I want to measure.
Let’s get started and work through how to combine the following tasks to achieve this, which can all be setup for free:
Querying with Google Custom Search Engine
Using the free Moz API account
Harvesting data with PHP and MySQL
Analyzing data with SQL and R
Querying with Google Custom Search Engine
We first need to query Google and get some results stored. To stay on the right side of Google’s terms of service, we’ll not be scraping Google.com directly but will instead use Google’s Custom Search feature. Google’s Custom Search is designed mainly to let website owners provide a Google like search widget on their website. However, there is also a REST based Google Search API that is free and lets you query Google and retrieve results in the popular JSON format. There are quota limits but these can be configured and extended to provide a good sample of data to work with.
When configured correctly to search the entire web, you can send queries to your Custom Search Engine, in our case using PHP, and treat them like Google responses, albeit with some caveats. The main limitations of using a Custom Search Engine are: (i) it doesn’t use some Google Web Search features such as personalized results and; (ii) it may have a subset of results from the Google index if you include more than ten sites.
Notwithstanding these limitations, there are many search options that can be passed to the Custom Search Engine to proxy what you might expect Google.com to return. In our scenario, we passed the following when making a call:
https://www.googleapis.com/customsearch/v1?key=<google_api_id>&userIp= <ip_address>&cx<custom_search_engine_id>&q=iPhone+X&cr=countryUS&start= 1</custom_search_engine_id></ip_address></google_api_id>
Where:
https://www.googleapis.com/customsearch/v1 – is the URL for the Google Custom Search API
key=<GOOGLE_API_ID> – Your Google Developer API Key
userIp=<IP_ADDRESS> – The IP address of the local machine making the call
cx=<CUSTOM_SEARCH_ENGINE_ID> – Your Google Custom Search Engine ID
q=iPhone+X – The Google query string (‘+’ replaces ‘ ‘)
cr=countryUS – Country restriction (from Goolge’s Country Collection Name list)
start=1 – The index of the first result to return – e.g. SERP page 1. Successive calls would increment this to get pages 2–5.
Google has said that the Google Custom Search engine differs from Google .com, but in my limited prod testing comparing results between the two, I was encouraged by the similarities and so continued with the analysis. That said, keep in mind that the data and results below come from Google Custom Search (using ‘whole web’ queries), not Google.com.
Using the free Moz API account
Moz provide an Application Programming Interface (API). To use it you will need to register for a Mozscape API key, which is free but limited to 2,500 rows per month and one query every ten seconds. Current paid plans give you increased quotas and start at $250/month. Having a free account and API key, you can then query the Links API and analyze the following metrics:
Moz data field
Moz API code
Description
ueid
32
The number of external equity links to the URL
uid
2048
The number of links (external, equity or nonequity or not,) to the URL
umrp**
16384
The MozRank of the URL, as a normalized 10-point score
umrr**
16384
The MozRank of the URL, as a raw score
fmrp**
32768
The MozRank of the URL's subdomain, as a normalized 10-point score
fmrr**
32768
The MozRank of the URL's subdomain, as a raw score
us
536870912
The HTTP status code recorded for this URL, if available
upa
34359738368
A normalized 100-point score representing the likelihood of a page to rank well in search engine results
pda
68719476736
A normalized 100-point score representing the likelihood of a domain to rank well in search engine results
NOTE: Since this analysis was captured, Moz documented that they have deprecated these fields. However, in testing this (15-06-2019), the fields were still present.
Moz API Codes are added together before calling the Links API with something that looks like the following:
www.apple.com%2F?Cols=103616137253&AccessID=MOZ_ACCESS_ID& Expires=1560586149&Signature=<MOZ_SECRET_KEY>
Where:
http://lsapi.seomoz.com/linkscape/url-metrics/" class="redactor-autoparser-object">http://lsapi.seomoz.com/linksc... – Is the URL for the Moz API
http%3A%2F%2Fwww.apple.com%2F – An encoded URL that we want to get data on
Cols=103616137253 – The sum of the Moz API codes from the table above
AccessID=MOZ_ACCESS_ID – An encoded version of the Moz Access ID (found in your API account)
Expires=1560586149 – A timeout for the query - set a few minutes into the future
Signature=<MOZ_SECRET_KEY> – An encoded version of the Moz Access ID (found in your API account)
Moz will return with something like the following JSON:
Array ( [ut] => Apple [uu] => <a href="http://www.apple.com/" class="redactor-autoparser-object">www.apple.com/</a> [ueid] => 13078035 [uid] => 14632963 [uu] => www.apple.com/ [ueid] => 13078035 [uid] => 14632963 [umrp] => 9 [umrr] => 0.8999999762 [fmrp] => 2.602215052 [fmrr] => 0.2602215111 [us] => 200 [upa] => 90 [pda] => 100 )
For a great starting point on querying Moz with PHP, Perl, Python, Ruby and Javascript, see this repository on Github. I chose to use PHP.
Harvesting data with PHP and MySQL
Now we have a Google Custom Search Engine and our Moz API, we’re almost ready to capture data. Google and Moz respond to requests via the JSON format and so can be queried by many popular programming languages. In addition to my chosen language, PHP, I wrote the results of both Google and Moz to a database and chose MySQL Community Edition for this. Other databases could be also used, e.g. Postgres, Oracle, Microsoft SQL Server etc. Doing so enables persistence of the data and ad-hoc analysis using SQL (Structured Query Language) as well as other languages (like R, which I will go over later). After creating database tables to hold the Google search results (with fields for rank, URL etc.) and a table to hold Moz data fields (ueid, upa, uda etc.), we’re ready to design our data harvesting plan.
Google provide a generous quota with the Custom Search Engine (up to 100M queries per day with the same Google developer console key) but the Moz free API is limited to 2,500. Though for Moz, paid for options provide between 120k and 40M rows per month depending on plans and range in cost from $250–$10,000/month. Therefore, as I’m just exploring the free option, I designed my code to harvest 125 Google queries over 2 pages of SERPs (10 results per page) allowing me to stay within the Moz 2,500 row quota. As for which searches to fire at Google, there are numerous resources to use from. I chose to use Mondovo as they provide numerous lists by category and up to 500 words per list which is ample for the experiment.
I also rolled in a few PHP helper classes alongside my own code for database I/O and HTTP.
In summary, the main PHP building blocks and sources used were:
Google Custom Search Engine – Ash Kiswany wrote an excellent article using Jacob Fogg’s PHP interface for Google Custom Search;
Mozscape API – As mentioned, this PHP implementation for accessing Moz on Github was a good starting point;
Website crawler and HTTP – At Purple Toolz, we have our own crawler called PurpleToolzBot which uses Curl for HTTP and this Simple HTML DOM Parser;
Database I/O – PHP has excellent support for MySQL which I wrapped into classes from these tutorials.
One factor to be aware of is the 10 second interval between Moz API calls. This is to prevent Moz being overloaded by free API users. To handle this in software, I wrote a "query throttler" which blocked access to the Moz API between successive calls within a timeframe. However, whilst working perfectly it meant that calling Moz 2,500 times in succession took just under 7 hours to complete.
Analyzing data with SQL and R
Data harvested. Now the fun begins!
It’s time to have a look at what we’ve got. This is sometimes called data wrangling. I use a free statistical programming language called R along with a development environment (editor) called R Studio. There are other languages such as Stata and more graphical data science tools like Tableau, but these cost and the finance director at Purple Toolz isn’t someone to cross!
I have been using R for a number of years because it’s open source and it has many third-party libraries, making it extremely versatile and appropriate for this kind of work.
Let’s roll up our sleeves.
I now have a couple of database tables with the results of my 125 search term queries across 2 pages of SERPS (i.e. 20 ranked URLs per search term). Two database tables hold the Google results and another table holds the Moz data results. To access these, we’ll need to do a database INNER JOIN which we can easily accomplish by using the RMySQL package with R. This is loaded by typing "install.packages('RMySQL')" into R’s console and including the line "library(RMySQL)" at the top of our R script.
We can then do the following to connect and get the data into an R data frame variable called "theResults."
library(RMySQL) # INNER JOIN the two tables theQuery <- " SELECT A.*, B.*, C.* FROM ( SELECT cseq_search_id FROM cse_query ) A -- Custom Search Query INNER JOIN ( SELECT cser_cseq_id, cser_rank, cser_url FROM cse_results ) B -- Custom Search Results ON A.cseq_search_id = B.cser_cseq_id INNER JOIN ( SELECT * FROM moz ) C -- Moz Data Fields ON B.cser_url = C.moz_url ; " # [1] Connect to the database # Replace USER_NAME with your database username # Replace PASSWORD with your database password # Replace MY_DB with your database name theConn <- dbConnect(dbDriver("MySQL"), user = "USER_NAME", password = "PASSWORD", dbname = "MY_DB") # [2] Query the database and hold the results theResults <- dbGetQuery(theConn, theQuery) # [3] Disconnect from the database dbDisconnect(theConn)
NOTE: I have two tables to hold the Google Custom Search Engine data. One holds data on the Google query (cse_query) and one holds results (cse_results).
We can now use R’s full range of statistical functions to begin wrangling.
Let’s start with some summaries to get a feel for the data. The process I go through is basically the same for each of the fields, so let’s illustrate and use Moz’s ‘UEID’ field (the number of external equity links to a URL). By typing the following into R I get the this:
> summary(theResults$moz_ueid) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 1 20 14709 182 2755274 > quantile(theResults$moz_ueid, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 0.0 0.0 0.0 1.0 20.0 182.0 337.2 1715.2 7873.4 412283.4 2755274.0
Looking at this, you can see that the data is skewed (a lot) by the relationship of the median to the mean, which is being pulled by values in the upper quartile range (values beyond 75% of the observations). We can however, plot this as a box and whisker plot in R where each X value is the distribution of UEIDs by rank from Google Custom Search position 1-20.
Note we are using a log scale on the y-axis so that we can display the full range of values as they vary a lot!
A box and whisker plot in R of Moz’s UEID by Google rank (note: log scale)
Box and whisker plots are great as they show a lot of information in them (see the geom_boxplot function in R). The purple boxed area represents the Inter-Quartile Range (IQR) which are the values between 25% and 75% of observations. The horizontal line in each ‘box’ represents the median value (the one in the middle when ordered), whilst the lines extending from the box (called the ‘whiskers’) represent 1.5x IQR. Dots outside the whiskers are called ‘outliers’ and show where the extents of each rank’s set of observations are. Despite the log scale, we can see a noticeable pull-up from rank #10 to rank #1 in median values, indicating that the number of equity links might be a Google ranking factor. Let’s explore this further with density plots.
Density plots are a lot like distributions (histograms) but show smooth lines rather than bars for the data. Much like a histogram, a density plot’s peak shows where the data values are concentrated and can help when comparing two distributions. In the density plot below, I have split the data into two categories: (i) results that appeared on Page 1 of SERPs ranked 1-10 are in pink and; (ii) results that appeared on SERP Page 2 are in blue. I have also plotted the medians of both distributions to help illustrate the difference in results between Page 1 and Page 2.
The inference from these two density plots is that Page 1 SERP results had more external equity backlinks (UEIDs) on than Page 2 results. You can also see the median values for these two categories below which clearly shows how the value for Page 1 (38) is far greater than Page 2 (11). So we now have some numbers to base our SEO strategy for backlinks on.
# Create a factor in R according to which SERP page a result (cser_rank) is on > theResults$rankBin <- paste("Page", ceiling(theResults$cser_rank / 10)) > theResults$rankBin <- factor(theResults$rankBin) # Now report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_ueid, theResults$rankBin, median) Page 1 Page 2 38 11
From this, we can deduce that equity backlinks (UEID) matter and if I were advising a client based on this data, I would say they should be looking to get over 38 equity-based backlinks to help them get to Page 1 of SERPs. Of course, this is a limited sample and more research, a bigger sample and other ranking factors would need to be considered, but you get the idea.
Now let’s investigate another metric that has less of a range on it than UEID and look at Moz’s UPA measure, which is the likelihood that a page will rank well in search engine results.
> summary(theResults$moz_upa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 33.00 41.00 41.22 50.00 81.00 > quantile(theResults$moz_upa, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 12 20 25 33 41 50 53 58 62 75 81
UPA is a number given to a URL and ranges between 0–100. The data is better behaved than the previous UEID unbounded variable having its mean and median close together making for a more ‘normal’ distribution as we can see below by plotting a histogram in R.
A histogram of Moz’s UPA score
We’ll do the same Page 1 : Page 2 split and density plot that we did before and look at the UPA score distributions when we divide the UPA data into two groups.
# Report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_upa, theResults$rankBin, median) Page 1 Page 2 43 39
In summary, two very different distributions from two Moz API variables. But both showed differences in their scores between SERP pages and provide you with tangible values (medians) to work with and ultimately advise clients on or apply to your own SEO.
Of course, this is just a small sample and shouldn’t be taken literally. But with free resources from both Google and Moz, you can now see how you can begin to develop analytical capabilities of your own to base your assumptions on rather than accepting the norm. SEO ranking factors change all the time and having your own analytical tools to conduct your own tests and experiments on will help give you credibility and perhaps even a unique insight on something hitherto unknown.
Google provide you with a healthy free quota to obtain search results from. If you need more than the 2,500 rows/month Moz provide for free there are numerous paid-for plans you can purchase. MySQL is a free download and R is also a free package for statistical analysis (and much more).
Go explore!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from The Moz Blog http://tracking.feedpress.it/link/9375/12915591
0 notes
Text
SEO Analytics for Free - Combining Google Search with the Moz API
Posted by Purple-Toolz
I’m a self-funded start-up business owner. As such, I want to get as much as I can for free before convincing our finance director to spend our hard-earned bootstrapping funds. I’m also an analyst with a background in data and computer science, so a bit of a geek by any definition.
What I try to do, with my SEO analyst hat on, is hunt down great sources of free data and wrangle it into something insightful. Why? Because there’s no value in basing client advice on conjecture. It’s far better to combine quality data with good analysis and help our clients better understand what’s important for them to focus on.
In this article, I will tell you how to get started using a few free resources and illustrate how to pull together unique analytics that provide useful insights for your blog articles if you’re a writer, your agency if you’re an SEO, or your website if you’re a client or owner doing SEO yourself.
The scenario I’m going to use is that I want analyze some SEO attributes (e.g. backlinks, Page Authority etc.) and look at their effect on Google ranking. I want to answer questions like “Do backlinks really matter in getting to Page 1 of SERPs?” and “What kind of Page Authority score do I really need to be in the top 10 results?” To do this, I will need to combine data from a number of Google searches with data on each result that has the SEO attributes in that I want to measure.
Let’s get started and work through how to combine the following tasks to achieve this, which can all be setup for free:
Querying with Google Custom Search Engine
Using the free Moz API account
Harvesting data with PHP and MySQL
Analyzing data with SQL and R
Querying with Google Custom Search Engine
We first need to query Google and get some results stored. To stay on the right side of Google’s terms of service, we’ll not be scraping Google.com directly but will instead use Google’s Custom Search feature. Google’s Custom Search is designed mainly to let website owners provide a Google like search widget on their website. However, there is also a REST based Google Search API that is free and lets you query Google and retrieve results in the popular JSON format. There are quota limits but these can be configured and extended to provide a good sample of data to work with.
When configured correctly to search the entire web, you can send queries to your Custom Search Engine, in our case using PHP, and treat them like Google responses, albeit with some caveats. The main limitations of using a Custom Search Engine are: (i) it doesn’t use some Google Web Search features such as personalized results and; (ii) it may have a subset of results from the Google index if you include more than ten sites.
Notwithstanding these limitations, there are many search options that can be passed to the Custom Search Engine to proxy what you might expect Google.com to return. In our scenario, we passed the following when making a call:
https://www.googleapis.com/customsearch/v1?key=<google_api_id>&userIp= <ip_address>&cx<custom_search_engine_id>&q=iPhone+X&cr=countryUS&start= 1</custom_search_engine_id></ip_address></google_api_id>
Where:
https://www.googleapis.com/customsearch/v1 – is the URL for the Google Custom Search API
key=<GOOGLE_API_ID> – Your Google Developer API Key
userIp=<IP_ADDRESS> – The IP address of the local machine making the call
cx=<CUSTOM_SEARCH_ENGINE_ID> – Your Google Custom Search Engine ID
q=iPhone+X – The Google query string (‘+’ replaces ‘ ‘)
cr=countryUS – Country restriction (from Goolge’s Country Collection Name list)
start=1 – The index of the first result to return – e.g. SERP page 1. Successive calls would increment this to get pages 2–5.
Google has said that the Google Custom Search engine differs from Google .com, but in my limited prod testing comparing results between the two, I was encouraged by the similarities and so continued with the analysis. That said, keep in mind that the data and results below come from Google Custom Search (using ‘whole web’ queries), not Google.com.
Using the free Moz API account
Moz provide an Application Programming Interface (API). To use it you will need to register for a Mozscape API key, which is free but limited to 2,500 rows per month and one query every ten seconds. Current paid plans give you increased quotas and start at $250/month. Having a free account and API key, you can then query the Links API and analyze the following metrics:
Moz data field
Moz API code
Description
ueid
32
The number of external equity links to the URL
uid
2048
The number of links (external, equity or nonequity or not,) to the URL
umrp**
16384
The MozRank of the URL, as a normalized 10-point score
umrr**
16384
The MozRank of the URL, as a raw score
fmrp**
32768
The MozRank of the URL's subdomain, as a normalized 10-point score
fmrr**
32768
The MozRank of the URL's subdomain, as a raw score
us
536870912
The HTTP status code recorded for this URL, if available
upa
34359738368
A normalized 100-point score representing the likelihood of a page to rank well in search engine results
pda
68719476736
A normalized 100-point score representing the likelihood of a domain to rank well in search engine results
NOTE: Since this analysis was captured, Moz documented that they have deprecated these fields. However, in testing this (15-06-2019), the fields were still present.
Moz API Codes are added together before calling the Links API with something that looks like the following:
www.apple.com%2F?Cols=103616137253&AccessID=MOZ_ACCESS_ID& Expires=1560586149&Signature=<MOZ_SECRET_KEY>
Where:
https://ift.tt/1bbWaai" class="redactor-autoparser-object">https://ift.tt/2oVcks4... – Is the URL for the Moz API
http%3A%2F%2Fwww.apple.com%2F – An encoded URL that we want to get data on
Cols=103616137253 – The sum of the Moz API codes from the table above
AccessID=MOZ_ACCESS_ID – An encoded version of the Moz Access ID (found in your API account)
Expires=1560586149 – A timeout for the query - set a few minutes into the future
Signature=<MOZ_SECRET_KEY> – An encoded version of the Moz Access ID (found in your API account)
Moz will return with something like the following JSON:
Array ( [ut] => Apple [uu] => <a href="http://www.apple.com/" class="redactor-autoparser-object">www.apple.com/</a> [ueid] => 13078035 [uid] => 14632963 [uu] => www.apple.com/ [ueid] => 13078035 [uid] => 14632963 [umrp] => 9 [umrr] => 0.8999999762 [fmrp] => 2.602215052 [fmrr] => 0.2602215111 [us] => 200 [upa] => 90 [pda] => 100 )
For a great starting point on querying Moz with PHP, Perl, Python, Ruby and Javascript, see this repository on Github. I chose to use PHP.
Harvesting data with PHP and MySQL
Now we have a Google Custom Search Engine and our Moz API, we’re almost ready to capture data. Google and Moz respond to requests via the JSON format and so can be queried by many popular programming languages. In addition to my chosen language, PHP, I wrote the results of both Google and Moz to a database and chose MySQL Community Edition for this. Other databases could be also used, e.g. Postgres, Oracle, Microsoft SQL Server etc. Doing so enables persistence of the data and ad-hoc analysis using SQL (Structured Query Language) as well as other languages (like R, which I will go over later). After creating database tables to hold the Google search results (with fields for rank, URL etc.) and a table to hold Moz data fields (ueid, upa, uda etc.), we’re ready to design our data harvesting plan.
Google provide a generous quota with the Custom Search Engine (up to 100M queries per day with the same Google developer console key) but the Moz free API is limited to 2,500. Though for Moz, paid for options provide between 120k and 40M rows per month depending on plans and range in cost from $250–$10,000/month. Therefore, as I’m just exploring the free option, I designed my code to harvest 125 Google queries over 2 pages of SERPs (10 results per page) allowing me to stay within the Moz 2,500 row quota. As for which searches to fire at Google, there are numerous resources to use from. I chose to use Mondovo as they provide numerous lists by category and up to 500 words per list which is ample for the experiment.
I also rolled in a few PHP helper classes alongside my own code for database I/O and HTTP.
In summary, the main PHP building blocks and sources used were:
Google Custom Search Engine – Ash Kiswany wrote an excellent article using Jacob Fogg’s PHP interface for Google Custom Search;
Mozscape API – As mentioned, this PHP implementation for accessing Moz on Github was a good starting point;
Website crawler and HTTP – At Purple Toolz, we have our own crawler called PurpleToolzBot which uses Curl for HTTP and this Simple HTML DOM Parser;
Database I/O – PHP has excellent support for MySQL which I wrapped into classes from these tutorials.
One factor to be aware of is the 10 second interval between Moz API calls. This is to prevent Moz being overloaded by free API users. To handle this in software, I wrote a "query throttler" which blocked access to the Moz API between successive calls within a timeframe. However, whilst working perfectly it meant that calling Moz 2,500 times in succession took just under 7 hours to complete.
Analyzing data with SQL and R
Data harvested. Now the fun begins!
It’s time to have a look at what we’ve got. This is sometimes called data wrangling. I use a free statistical programming language called R along with a development environment (editor) called R Studio. There are other languages such as Stata and more graphical data science tools like Tableau, but these cost and the finance director at Purple Toolz isn’t someone to cross!
I have been using R for a number of years because it’s open source and it has many third-party libraries, making it extremely versatile and appropriate for this kind of work.
Let’s roll up our sleeves.
I now have a couple of database tables with the results of my 125 search term queries across 2 pages of SERPS (i.e. 20 ranked URLs per search term). Two database tables hold the Google results and another table holds the Moz data results. To access these, we’ll need to do a database INNER JOIN which we can easily accomplish by using the RMySQL package with R. This is loaded by typing "install.packages('RMySQL')" into R’s console and including the line "library(RMySQL)" at the top of our R script.
We can then do the following to connect and get the data into an R data frame variable called "theResults."
library(RMySQL) # INNER JOIN the two tables theQuery <- " SELECT A.*, B.*, C.* FROM ( SELECT cseq_search_id FROM cse_query ) A -- Custom Search Query INNER JOIN ( SELECT cser_cseq_id, cser_rank, cser_url FROM cse_results ) B -- Custom Search Results ON A.cseq_search_id = B.cser_cseq_id INNER JOIN ( SELECT * FROM moz ) C -- Moz Data Fields ON B.cser_url = C.moz_url ; " # [1] Connect to the database # Replace USER_NAME with your database username # Replace PASSWORD with your database password # Replace MY_DB with your database name theConn <- dbConnect(dbDriver("MySQL"), user = "USER_NAME", password = "PASSWORD", dbname = "MY_DB") # [2] Query the database and hold the results theResults <- dbGetQuery(theConn, theQuery) # [3] Disconnect from the database dbDisconnect(theConn)
NOTE: I have two tables to hold the Google Custom Search Engine data. One holds data on the Google query (cse_query) and one holds results (cse_results).
We can now use R’s full range of statistical functions to begin wrangling.
Let’s start with some summaries to get a feel for the data. The process I go through is basically the same for each of the fields, so let’s illustrate and use Moz’s ‘UEID’ field (the number of external equity links to a URL). By typing the following into R I get the this:
> summary(theResults$moz_ueid) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 1 20 14709 182 2755274 > quantile(theResults$moz_ueid, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 0.0 0.0 0.0 1.0 20.0 182.0 337.2 1715.2 7873.4 412283.4 2755274.0
Looking at this, you can see that the data is skewed (a lot) by the relationship of the median to the mean, which is being pulled by values in the upper quartile range (values beyond 75% of the observations). We can however, plot this as a box and whisker plot in R where each X value is the distribution of UEIDs by rank from Google Custom Search position 1-20.
Note we are using a log scale on the y-axis so that we can display the full range of values as they vary a lot!
A box and whisker plot in R of Moz’s UEID by Google rank (note: log scale)
Box and whisker plots are great as they show a lot of information in them (see the geom_boxplot function in R). The purple boxed area represents the Inter-Quartile Range (IQR) which are the values between 25% and 75% of observations. The horizontal line in each ‘box’ represents the median value (the one in the middle when ordered), whilst the lines extending from the box (called the ‘whiskers’) represent 1.5x IQR. Dots outside the whiskers are called ‘outliers’ and show where the extents of each rank’s set of observations are. Despite the log scale, we can see a noticeable pull-up from rank #10 to rank #1 in median values, indicating that the number of equity links might be a Google ranking factor. Let’s explore this further with density plots.
Density plots are a lot like distributions (histograms) but show smooth lines rather than bars for the data. Much like a histogram, a density plot’s peak shows where the data values are concentrated and can help when comparing two distributions. In the density plot below, I have split the data into two categories: (i) results that appeared on Page 1 of SERPs ranked 1-10 are in pink and; (ii) results that appeared on SERP Page 2 are in blue. I have also plotted the medians of both distributions to help illustrate the difference in results between Page 1 and Page 2.
The inference from these two density plots is that Page 1 SERP results had more external equity backlinks (UEIDs) on than Page 2 results. You can also see the median values for these two categories below which clearly shows how the value for Page 1 (38) is far greater than Page 2 (11). So we now have some numbers to base our SEO strategy for backlinks on.
# Create a factor in R according to which SERP page a result (cser_rank) is on > theResults$rankBin <- paste("Page", ceiling(theResults$cser_rank / 10)) > theResults$rankBin <- factor(theResults$rankBin) # Now report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_ueid, theResults$rankBin, median) Page 1 Page 2 38 11
From this, we can deduce that equity backlinks (UEID) matter and if I were advising a client based on this data, I would say they should be looking to get over 38 equity-based backlinks to help them get to Page 1 of SERPs. Of course, this is a limited sample and more research, a bigger sample and other ranking factors would need to be considered, but you get the idea.
Now let’s investigate another metric that has less of a range on it than UEID and look at Moz’s UPA measure, which is the likelihood that a page will rank well in search engine results.
> summary(theResults$moz_upa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 33.00 41.00 41.22 50.00 81.00 > quantile(theResults$moz_upa, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 12 20 25 33 41 50 53 58 62 75 81
UPA is a number given to a URL and ranges between 0–100. The data is better behaved than the previous UEID unbounded variable having its mean and median close together making for a more ‘normal’ distribution as we can see below by plotting a histogram in R.
A histogram of Moz’s UPA score
We’ll do the same Page 1 : Page 2 split and density plot that we did before and look at the UPA score distributions when we divide the UPA data into two groups.
# Report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_upa, theResults$rankBin, median) Page 1 Page 2 43 39
In summary, two very different distributions from two Moz API variables. But both showed differences in their scores between SERP pages and provide you with tangible values (medians) to work with and ultimately advise clients on or apply to your own SEO.
Of course, this is just a small sample and shouldn’t be taken literally. But with free resources from both Google and Moz, you can now see how you can begin to develop analytical capabilities of your own to base your assumptions on rather than accepting the norm. SEO ranking factors change all the time and having your own analytical tools to conduct your own tests and experiments on will help give you credibility and perhaps even a unique insight on something hitherto unknown.
Google provide you with a healthy free quota to obtain search results from. If you need more than the 2,500 rows/month Moz provide for free there are numerous paid-for plans you can purchase. MySQL is a free download and R is also a free package for statistical analysis (and much more).
Go explore!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from The Moz Blog https://ift.tt/2JeY5Fh via IFTTT
0 notes
Text
SEO Analytics for Free - Combining Google Search with the Moz API
Posted by Purple-Toolz
I’m a self-funded start-up business owner. As such, I want to get as much as I can for free before convincing our finance director to spend our hard-earned bootstrapping funds. I’m also an analyst with a background in data and computer science, so a bit of a geek by any definition.
What I try to do, with my SEO analyst hat on, is hunt down great sources of free data and wrangle it into something insightful. Why? Because there’s no value in basing client advice on conjecture. It’s far better to combine quality data with good analysis and help our clients better understand what’s important for them to focus on.
In this article, I will tell you how to get started using a few free resources and illustrate how to pull together unique analytics that provide useful insights for your blog articles if you’re a writer, your agency if you’re an SEO, or your website if you’re a client or owner doing SEO yourself.
The scenario I’m going to use is that I want analyze some SEO attributes (e.g. backlinks, Page Authority etc.) and look at their effect on Google ranking. I want to answer questions like “Do backlinks really matter in getting to Page 1 of SERPs?” and “What kind of Page Authority score do I really need to be in the top 10 results?” To do this, I will need to combine data from a number of Google searches with data on each result that has the SEO attributes in that I want to measure.
Let’s get started and work through how to combine the following tasks to achieve this, which can all be setup for free:
Querying with Google Custom Search Engine
Using the free Moz API account
Harvesting data with PHP and MySQL
Analyzing data with SQL and R
Querying with Google Custom Search Engine
We first need to query Google and get some results stored. To stay on the right side of Google’s terms of service, we’ll not be scraping Google.com directly but will instead use Google’s Custom Search feature. Google’s Custom Search is designed mainly to let website owners provide a Google like search widget on their website. However, there is also a REST based Google Search API that is free and lets you query Google and retrieve results in the popular JSON format. There are quota limits but these can be configured and extended to provide a good sample of data to work with.
When configured correctly to search the entire web, you can send queries to your Custom Search Engine, in our case using PHP, and treat them like Google responses, albeit with some caveats. The main limitations of using a Custom Search Engine are: (i) it doesn’t use some Google Web Search features such as personalized results and; (ii) it may have a subset of results from the Google index if you include more than ten sites.
Notwithstanding these limitations, there are many search options that can be passed to the Custom Search Engine to proxy what you might expect Google.com to return. In our scenario, we passed the following when making a call:
https://www.googleapis.com/customsearch/v1?key=<google_api_id>&userIp= <ip_address>&cx<custom_search_engine_id>&q=iPhone+X&cr=countryUS&start= 1</custom_search_engine_id></ip_address></google_api_id>
Where:
https://www.googleapis.com/customsearch/v1 – is the URL for the Google Custom Search API
key=<GOOGLE_API_ID> – Your Google Developer API Key
userIp=<IP_ADDRESS> – The IP address of the local machine making the call
cx=<CUSTOM_SEARCH_ENGINE_ID> – Your Google Custom Search Engine ID
q=iPhone+X – The Google query string (‘+’ replaces ‘ ‘)
cr=countryUS – Country restriction (from Goolge’s Country Collection Name list)
start=1 – The index of the first result to return – e.g. SERP page 1. Successive calls would increment this to get pages 2–5.
Google has said that the Google Custom Search engine differs from Google .com, but in my limited prod testing comparing results between the two, I was encouraged by the similarities and so continued with the analysis. That said, keep in mind that the data and results below come from Google Custom Search (using ‘whole web’ queries), not Google.com.
Using the free Moz API account
Moz provide an Application Programming Interface (API). To use it you will need to register for a Mozscape API key, which is free but limited to 2,500 rows per month and one query every ten seconds. Current paid plans give you increased quotas and start at $250/month. Having a free account and API key, you can then query the Links API and analyze the following metrics:
Moz data field
Moz API code
Description
ueid
32
The number of external equity links to the URL
uid
2048
The number of links (external, equity or nonequity or not,) to the URL
umrp**
16384
The MozRank of the URL, as a normalized 10-point score
umrr**
16384
The MozRank of the URL, as a raw score
fmrp**
32768
The MozRank of the URL's subdomain, as a normalized 10-point score
fmrr**
32768
The MozRank of the URL's subdomain, as a raw score
us
536870912
The HTTP status code recorded for this URL, if available
upa
34359738368
A normalized 100-point score representing the likelihood of a page to rank well in search engine results
pda
68719476736
A normalized 100-point score representing the likelihood of a domain to rank well in search engine results
NOTE: Since this analysis was captured, Moz documented that they have deprecated these fields. However, in testing this (15-06-2019), the fields were still present.
Moz API Codes are added together before calling the Links API with something that looks like the following:
www.apple.com%2F?Cols=103616137253&AccessID=MOZ_ACCESS_ID& Expires=1560586149&Signature=<MOZ_SECRET_KEY>
Where:
https://ift.tt/1bbWaai" class="redactor-autoparser-object">https://ift.tt/2oVcks4... – Is the URL for the Moz API
http%3A%2F%2Fwww.apple.com%2F – An encoded URL that we want to get data on
Cols=103616137253 – The sum of the Moz API codes from the table above
AccessID=MOZ_ACCESS_ID – An encoded version of the Moz Access ID (found in your API account)
Expires=1560586149 – A timeout for the query - set a few minutes into the future
Signature=<MOZ_SECRET_KEY> – An encoded version of the Moz Access ID (found in your API account)
Moz will return with something like the following JSON:
Array ( [ut] => Apple [uu] => <a href="http://www.apple.com/" class="redactor-autoparser-object">www.apple.com/</a> [ueid] => 13078035 [uid] => 14632963 [uu] => www.apple.com/ [ueid] => 13078035 [uid] => 14632963 [umrp] => 9 [umrr] => 0.8999999762 [fmrp] => 2.602215052 [fmrr] => 0.2602215111 [us] => 200 [upa] => 90 [pda] => 100 )
For a great starting point on querying Moz with PHP, Perl, Python, Ruby and Javascript, see this repository on Github. I chose to use PHP.
Harvesting data with PHP and MySQL
Now we have a Google Custom Search Engine and our Moz API, we’re almost ready to capture data. Google and Moz respond to requests via the JSON format and so can be queried by many popular programming languages. In addition to my chosen language, PHP, I wrote the results of both Google and Moz to a database and chose MySQL Community Edition for this. Other databases could be also used, e.g. Postgres, Oracle, Microsoft SQL Server etc. Doing so enables persistence of the data and ad-hoc analysis using SQL (Structured Query Language) as well as other languages (like R, which I will go over later). After creating database tables to hold the Google search results (with fields for rank, URL etc.) and a table to hold Moz data fields (ueid, upa, uda etc.), we’re ready to design our data harvesting plan.
Google provide a generous quota with the Custom Search Engine (up to 100M queries per day with the same Google developer console key) but the Moz free API is limited to 2,500. Though for Moz, paid for options provide between 120k and 40M rows per month depending on plans and range in cost from $250–$10,000/month. Therefore, as I’m just exploring the free option, I designed my code to harvest 125 Google queries over 2 pages of SERPs (10 results per page) allowing me to stay within the Moz 2,500 row quota. As for which searches to fire at Google, there are numerous resources to use from. I chose to use Mondovo as they provide numerous lists by category and up to 500 words per list which is ample for the experiment.
I also rolled in a few PHP helper classes alongside my own code for database I/O and HTTP.
In summary, the main PHP building blocks and sources used were:
Google Custom Search Engine – Ash Kiswany wrote an excellent article using Jacob Fogg’s PHP interface for Google Custom Search;
Mozscape API – As mentioned, this PHP implementation for accessing Moz on Github was a good starting point;
Website crawler and HTTP – At Purple Toolz, we have our own crawler called PurpleToolzBot which uses Curl for HTTP and this Simple HTML DOM Parser;
Database I/O – PHP has excellent support for MySQL which I wrapped into classes from these tutorials.
One factor to be aware of is the 10 second interval between Moz API calls. This is to prevent Moz being overloaded by free API users. To handle this in software, I wrote a "query throttler" which blocked access to the Moz API between successive calls within a timeframe. However, whilst working perfectly it meant that calling Moz 2,500 times in succession took just under 7 hours to complete.
Analyzing data with SQL and R
Data harvested. Now the fun begins!
It’s time to have a look at what we’ve got. This is sometimes called data wrangling. I use a free statistical programming language called R along with a development environment (editor) called R Studio. There are other languages such as Stata and more graphical data science tools like Tableau, but these cost and the finance director at Purple Toolz isn’t someone to cross!
I have been using R for a number of years because it’s open source and it has many third-party libraries, making it extremely versatile and appropriate for this kind of work.
Let’s roll up our sleeves.
I now have a couple of database tables with the results of my 125 search term queries across 2 pages of SERPS (i.e. 20 ranked URLs per search term). Two database tables hold the Google results and another table holds the Moz data results. To access these, we’ll need to do a database INNER JOIN which we can easily accomplish by using the RMySQL package with R. This is loaded by typing "install.packages('RMySQL')" into R’s console and including the line "library(RMySQL)" at the top of our R script.
We can then do the following to connect and get the data into an R data frame variable called "theResults."
library(RMySQL) # INNER JOIN the two tables theQuery <- " SELECT A.*, B.*, C.* FROM ( SELECT cseq_search_id FROM cse_query ) A -- Custom Search Query INNER JOIN ( SELECT cser_cseq_id, cser_rank, cser_url FROM cse_results ) B -- Custom Search Results ON A.cseq_search_id = B.cser_cseq_id INNER JOIN ( SELECT * FROM moz ) C -- Moz Data Fields ON B.cser_url = C.moz_url ; " # [1] Connect to the database # Replace USER_NAME with your database username # Replace PASSWORD with your database password # Replace MY_DB with your database name theConn <- dbConnect(dbDriver("MySQL"), user = "USER_NAME", password = "PASSWORD", dbname = "MY_DB") # [2] Query the database and hold the results theResults <- dbGetQuery(theConn, theQuery) # [3] Disconnect from the database dbDisconnect(theConn)
NOTE: I have two tables to hold the Google Custom Search Engine data. One holds data on the Google query (cse_query) and one holds results (cse_results).
We can now use R’s full range of statistical functions to begin wrangling.
Let’s start with some summaries to get a feel for the data. The process I go through is basically the same for each of the fields, so let’s illustrate and use Moz’s ‘UEID’ field (the number of external equity links to a URL). By typing the following into R I get the this:
> summary(theResults$moz_ueid) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 1 20 14709 182 2755274 > quantile(theResults$moz_ueid, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 0.0 0.0 0.0 1.0 20.0 182.0 337.2 1715.2 7873.4 412283.4 2755274.0
Looking at this, you can see that the data is skewed (a lot) by the relationship of the median to the mean, which is being pulled by values in the upper quartile range (values beyond 75% of the observations). We can however, plot this as a box and whisker plot in R where each X value is the distribution of UEIDs by rank from Google Custom Search position 1-20.
Note we are using a log scale on the y-axis so that we can display the full range of values as they vary a lot!
A box and whisker plot in R of Moz’s UEID by Google rank (note: log scale)
Box and whisker plots are great as they show a lot of information in them (see the geom_boxplot function in R). The purple boxed area represents the Inter-Quartile Range (IQR) which are the values between 25% and 75% of observations. The horizontal line in each ‘box’ represents the median value (the one in the middle when ordered), whilst the lines extending from the box (called the ‘whiskers’) represent 1.5x IQR. Dots outside the whiskers are called ‘outliers’ and show where the extents of each rank’s set of observations are. Despite the log scale, we can see a noticeable pull-up from rank #10 to rank #1 in median values, indicating that the number of equity links might be a Google ranking factor. Let’s explore this further with density plots.
Density plots are a lot like distributions (histograms) but show smooth lines rather than bars for the data. Much like a histogram, a density plot’s peak shows where the data values are concentrated and can help when comparing two distributions. In the density plot below, I have split the data into two categories: (i) results that appeared on Page 1 of SERPs ranked 1-10 are in pink and; (ii) results that appeared on SERP Page 2 are in blue. I have also plotted the medians of both distributions to help illustrate the difference in results between Page 1 and Page 2.
The inference from these two density plots is that Page 1 SERP results had more external equity backlinks (UEIDs) on than Page 2 results. You can also see the median values for these two categories below which clearly shows how the value for Page 1 (38) is far greater than Page 2 (11). So we now have some numbers to base our SEO strategy for backlinks on.
# Create a factor in R according to which SERP page a result (cser_rank) is on > theResults$rankBin <- paste("Page", ceiling(theResults$cser_rank / 10)) > theResults$rankBin <- factor(theResults$rankBin) # Now report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_ueid, theResults$rankBin, median) Page 1 Page 2 38 11
From this, we can deduce that equity backlinks (UEID) matter and if I were advising a client based on this data, I would say they should be looking to get over 38 equity-based backlinks to help them get to Page 1 of SERPs. Of course, this is a limited sample and more research, a bigger sample and other ranking factors would need to be considered, but you get the idea.
Now let’s investigate another metric that has less of a range on it than UEID and look at Moz’s UPA measure, which is the likelihood that a page will rank well in search engine results.
> summary(theResults$moz_upa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 33.00 41.00 41.22 50.00 81.00 > quantile(theResults$moz_upa, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 12 20 25 33 41 50 53 58 62 75 81
UPA is a number given to a URL and ranges between 0–100. The data is better behaved than the previous UEID unbounded variable having its mean and median close together making for a more ‘normal’ distribution as we can see below by plotting a histogram in R.
A histogram of Moz’s UPA score
We’ll do the same Page 1 : Page 2 split and density plot that we did before and look at the UPA score distributions when we divide the UPA data into two groups.
# Report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_upa, theResults$rankBin, median) Page 1 Page 2 43 39
In summary, two very different distributions from two Moz API variables. But both showed differences in their scores between SERP pages and provide you with tangible values (medians) to work with and ultimately advise clients on or apply to your own SEO.
Of course, this is just a small sample and shouldn’t be taken literally. But with free resources from both Google and Moz, you can now see how you can begin to develop analytical capabilities of your own to base your assumptions on rather than accepting the norm. SEO ranking factors change all the time and having your own analytical tools to conduct your own tests and experiments on will help give you credibility and perhaps even a unique insight on something hitherto unknown.
Google provide you with a healthy free quota to obtain search results from. If you need more than the 2,500 rows/month Moz provide for free there are numerous paid-for plans you can purchase. MySQL is a free download and R is also a free package for statistical analysis (and much more).
Go explore!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
Text
SEO Analytics for Free - Combining Google Search with the Moz API
Posted by Purple-Toolz
I’m a self-funded start-up business owner. As such, I want to get as much as I can for free before convincing our finance director to spend our hard-earned bootstrapping funds. I’m also an analyst with a background in data and computer science, so a bit of a geek by any definition.
What I try to do, with my SEO analyst hat on, is hunt down great sources of free data and wrangle it into something insightful. Why? Because there’s no value in basing client advice on conjecture. It’s far better to combine quality data with good analysis and help our clients better understand what’s important for them to focus on.
In this article, I will tell you how to get started using a few free resources and illustrate how to pull together unique analytics that provide useful insights for your blog articles if you’re a writer, your agency if you’re an SEO, or your website if you’re a client or owner doing SEO yourself.
The scenario I’m going to use is that I want analyze some SEO attributes (e.g. backlinks, Page Authority etc.) and look at their effect on Google ranking. I want to answer questions like “Do backlinks really matter in getting to Page 1 of SERPs?” and “What kind of Page Authority score do I really need to be in the top 10 results?” To do this, I will need to combine data from a number of Google searches with data on each result that has the SEO attributes in that I want to measure.
Let’s get started and work through how to combine the following tasks to achieve this, which can all be setup for free:
Querying with Google Custom Search Engine
Using the free Moz API account
Harvesting data with PHP and MySQL
Analyzing data with SQL and R
Querying with Google Custom Search Engine
We first need to query Google and get some results stored. To stay on the right side of Google’s terms of service, we’ll not be scraping Google.com directly but will instead use Google’s Custom Search feature. Google’s Custom Search is designed mainly to let website owners provide a Google like search widget on their website. However, there is also a REST based Google Search API that is free and lets you query Google and retrieve results in the popular JSON format. There are quota limits but these can be configured and extended to provide a good sample of data to work with.
When configured correctly to search the entire web, you can send queries to your Custom Search Engine, in our case using PHP, and treat them like Google responses, albeit with some caveats. The main limitations of using a Custom Search Engine are: (i) it doesn’t use some Google Web Search features such as personalized results and; (ii) it may have a subset of results from the Google index if you include more than ten sites.
Notwithstanding these limitations, there are many search options that can be passed to the Custom Search Engine to proxy what you might expect Google.com to return. In our scenario, we passed the following when making a call:
https://www.googleapis.com/customsearch/v1?key=<google_api_id>&userIp= <ip_address>&cx<custom_search_engine_id>&q=iPhone+X&cr=countryUS&start= 1</custom_search_engine_id></ip_address></google_api_id>
Where:
https://www.googleapis.com/customsearch/v1 – is the URL for the Google Custom Search API
key=<GOOGLE_API_ID> – Your Google Developer API Key
userIp=<IP_ADDRESS> – The IP address of the local machine making the call
cx=<CUSTOM_SEARCH_ENGINE_ID> – Your Google Custom Search Engine ID
q=iPhone+X – The Google query string (‘+’ replaces ‘ ‘)
cr=countryUS – Country restriction (from Goolge’s Country Collection Name list)
start=1 – The index of the first result to return – e.g. SERP page 1. Successive calls would increment this to get pages 2–5.
Google has said that the Google Custom Search engine differs from Google .com, but in my limited prod testing comparing results between the two, I was encouraged by the similarities and so continued with the analysis. That said, keep in mind that the data and results below come from Google Custom Search (using ‘whole web’ queries), not Google.com.
Using the free Moz API account
Moz provide an Application Programming Interface (API). To use it you will need to register for a Mozscape API key, which is free but limited to 2,500 rows per month and one query every ten seconds. Current paid plans give you increased quotas and start at $250/month. Having a free account and API key, you can then query the Links API and analyze the following metrics:
Moz data field
Moz API code
Description
ueid
32
The number of external equity links to the URL
uid
2048
The number of links (external, equity or nonequity or not,) to the URL
umrp**
16384
The MozRank of the URL, as a normalized 10-point score
umrr**
16384
The MozRank of the URL, as a raw score
fmrp**
32768
The MozRank of the URL's subdomain, as a normalized 10-point score
fmrr**
32768
The MozRank of the URL's subdomain, as a raw score
us
536870912
The HTTP status code recorded for this URL, if available
upa
34359738368
A normalized 100-point score representing the likelihood of a page to rank well in search engine results
pda
68719476736
A normalized 100-point score representing the likelihood of a domain to rank well in search engine results
NOTE: Since this analysis was captured, Moz documented that they have deprecated these fields. However, in testing this (15-06-2019), the fields were still present.
Moz API Codes are added together before calling the Links API with something that looks like the following:
www.apple.com%2F?Cols=103616137253&AccessID=MOZ_ACCESS_ID& Expires=1560586149&Signature=<MOZ_SECRET_KEY>
Where:
http://lsapi.seomoz.com/linkscape/url-metrics/" class="redactor-autoparser-object">http://lsapi.seomoz.com/linksc... – Is the URL for the Moz API
http%3A%2F%2Fwww.apple.com%2F – An encoded URL that we want to get data on
Cols=103616137253 – The sum of the Moz API codes from the table above
AccessID=MOZ_ACCESS_ID – An encoded version of the Moz Access ID (found in your API account)
Expires=1560586149 – A timeout for the query - set a few minutes into the future
Signature=<MOZ_SECRET_KEY> – An encoded version of the Moz Access ID (found in your API account)
Moz will return with something like the following JSON:
Array ( [ut] => Apple [uu] => <a href="http://www.apple.com/" class="redactor-autoparser-object">www.apple.com/</a> [ueid] => 13078035 [uid] => 14632963 [uu] => www.apple.com/ [ueid] => 13078035 [uid] => 14632963 [umrp] => 9 [umrr] => 0.8999999762 [fmrp] => 2.602215052 [fmrr] => 0.2602215111 [us] => 200 [upa] => 90 [pda] => 100 )
For a great starting point on querying Moz with PHP, Perl, Python, Ruby and Javascript, see this repository on Github. I chose to use PHP.
Harvesting data with PHP and MySQL
Now we have a Google Custom Search Engine and our Moz API, we’re almost ready to capture data. Google and Moz respond to requests via the JSON format and so can be queried by many popular programming languages. In addition to my chosen language, PHP, I wrote the results of both Google and Moz to a database and chose MySQL Community Edition for this. Other databases could be also used, e.g. Postgres, Oracle, Microsoft SQL Server etc. Doing so enables persistence of the data and ad-hoc analysis using SQL (Structured Query Language) as well as other languages (like R, which I will go over later). After creating database tables to hold the Google search results (with fields for rank, URL etc.) and a table to hold Moz data fields (ueid, upa, uda etc.), we’re ready to design our data harvesting plan.
Google provide a generous quota with the Custom Search Engine (up to 100M queries per day with the same Google developer console key) but the Moz free API is limited to 2,500. Though for Moz, paid for options provide between 120k and 40M rows per month depending on plans and range in cost from $250–$10,000/month. Therefore, as I’m just exploring the free option, I designed my code to harvest 125 Google queries over 2 pages of SERPs (10 results per page) allowing me to stay within the Moz 2,500 row quota. As for which searches to fire at Google, there are numerous resources to use from. I chose to use Mondovo as they provide numerous lists by category and up to 500 words per list which is ample for the experiment.
I also rolled in a few PHP helper classes alongside my own code for database I/O and HTTP.
In summary, the main PHP building blocks and sources used were:
Google Custom Search Engine – Ash Kiswany wrote an excellent article using Jacob Fogg’s PHP interface for Google Custom Search;
Mozscape API – As mentioned, this PHP implementation for accessing Moz on Github was a good starting point;
Website crawler and HTTP – At Purple Toolz, we have our own crawler called PurpleToolzBot which uses Curl for HTTP and this Simple HTML DOM Parser;
Database I/O – PHP has excellent support for MySQL which I wrapped into classes from these tutorials.
One factor to be aware of is the 10 second interval between Moz API calls. This is to prevent Moz being overloaded by free API users. To handle this in software, I wrote a "query throttler" which blocked access to the Moz API between successive calls within a timeframe. However, whilst working perfectly it meant that calling Moz 2,500 times in succession took just under 7 hours to complete.
Analyzing data with SQL and R
Data harvested. Now the fun begins!
It’s time to have a look at what we’ve got. This is sometimes called data wrangling. I use a free statistical programming language called R along with a development environment (editor) called R Studio. There are other languages such as Stata and more graphical data science tools like Tableau, but these cost and the finance director at Purple Toolz isn’t someone to cross!
I have been using R for a number of years because it’s open source and it has many third-party libraries, making it extremely versatile and appropriate for this kind of work.
Let’s roll up our sleeves.
I now have a couple of database tables with the results of my 125 search term queries across 2 pages of SERPS (i.e. 20 ranked URLs per search term). Two database tables hold the Google results and another table holds the Moz data results. To access these, we’ll need to do a database INNER JOIN which we can easily accomplish by using the RMySQL package with R. This is loaded by typing "install.packages('RMySQL')" into R’s console and including the line "library(RMySQL)" at the top of our R script.
We can then do the following to connect and get the data into an R data frame variable called "theResults."
library(RMySQL) # INNER JOIN the two tables theQuery <- " SELECT A.*, B.*, C.* FROM ( SELECT cseq_search_id FROM cse_query ) A -- Custom Search Query INNER JOIN ( SELECT cser_cseq_id, cser_rank, cser_url FROM cse_results ) B -- Custom Search Results ON A.cseq_search_id = B.cser_cseq_id INNER JOIN ( SELECT * FROM moz ) C -- Moz Data Fields ON B.cser_url = C.moz_url ; " # [1] Connect to the database # Replace USER_NAME with your database username # Replace PASSWORD with your database password # Replace MY_DB with your database name theConn <- dbConnect(dbDriver("MySQL"), user = "USER_NAME", password = "PASSWORD", dbname = "MY_DB") # [2] Query the database and hold the results theResults <- dbGetQuery(theConn, theQuery) # [3] Disconnect from the database dbDisconnect(theConn)
NOTE: I have two tables to hold the Google Custom Search Engine data. One holds data on the Google query (cse_query) and one holds results (cse_results).
We can now use R’s full range of statistical functions to begin wrangling.
Let’s start with some summaries to get a feel for the data. The process I go through is basically the same for each of the fields, so let’s illustrate and use Moz’s ‘UEID’ field (the number of external equity links to a URL). By typing the following into R I get the this:
> summary(theResults$moz_ueid) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 1 20 14709 182 2755274 > quantile(theResults$moz_ueid, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 0.0 0.0 0.0 1.0 20.0 182.0 337.2 1715.2 7873.4 412283.4 2755274.0
Looking at this, you can see that the data is skewed (a lot) by the relationship of the median to the mean, which is being pulled by values in the upper quartile range (values beyond 75% of the observations). We can however, plot this as a box and whisker plot in R where each X value is the distribution of UEIDs by rank from Google Custom Search position 1-20.
Note we are using a log scale on the y-axis so that we can display the full range of values as they vary a lot!
A box and whisker plot in R of Moz’s UEID by Google rank (note: log scale)
Box and whisker plots are great as they show a lot of information in them (see the geom_boxplot function in R). The purple boxed area represents the Inter-Quartile Range (IQR) which are the values between 25% and 75% of observations. The horizontal line in each ‘box’ represents the median value (the one in the middle when ordered), whilst the lines extending from the box (called the ‘whiskers’) represent 1.5x IQR. Dots outside the whiskers are called ‘outliers’ and show where the extents of each rank’s set of observations are. Despite the log scale, we can see a noticeable pull-up from rank #10 to rank #1 in median values, indicating that the number of equity links might be a Google ranking factor. Let’s explore this further with density plots.
Density plots are a lot like distributions (histograms) but show smooth lines rather than bars for the data. Much like a histogram, a density plot’s peak shows where the data values are concentrated and can help when comparing two distributions. In the density plot below, I have split the data into two categories: (i) results that appeared on Page 1 of SERPs ranked 1-10 are in pink and; (ii) results that appeared on SERP Page 2 are in blue. I have also plotted the medians of both distributions to help illustrate the difference in results between Page 1 and Page 2.
The inference from these two density plots is that Page 1 SERP results had more external equity backlinks (UEIDs) on than Page 2 results. You can also see the median values for these two categories below which clearly shows how the value for Page 1 (38) is far greater than Page 2 (11). So we now have some numbers to base our SEO strategy for backlinks on.
# Create a factor in R according to which SERP page a result (cser_rank) is on > theResults$rankBin <- paste("Page", ceiling(theResults$cser_rank / 10)) > theResults$rankBin <- factor(theResults$rankBin) # Now report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_ueid, theResults$rankBin, median) Page 1 Page 2 38 11
From this, we can deduce that equity backlinks (UEID) matter and if I were advising a client based on this data, I would say they should be looking to get over 38 equity-based backlinks to help them get to Page 1 of SERPs. Of course, this is a limited sample and more research, a bigger sample and other ranking factors would need to be considered, but you get the idea.
Now let’s investigate another metric that has less of a range on it than UEID and look at Moz’s UPA measure, which is the likelihood that a page will rank well in search engine results.
> summary(theResults$moz_upa) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 33.00 41.00 41.22 50.00 81.00 > quantile(theResults$moz_upa, probs = c(1, 5, 10, 25, 50, 75, 80, 90, 95, 99, 100)/100) 1% 5% 10% 25% 50% 75% 80% 90% 95% 99% 100% 12 20 25 33 41 50 53 58 62 75 81
UPA is a number given to a URL and ranges between 0–100. The data is better behaved than the previous UEID unbounded variable having its mean and median close together making for a more ‘normal’ distribution as we can see below by plotting a histogram in R.
A histogram of Moz’s UPA score
We’ll do the same Page 1 : Page 2 split and density plot that we did before and look at the UPA score distributions when we divide the UPA data into two groups.
# Report the medians by SERP page by calling ‘tapply’ > tapply(theResults$moz_upa, theResults$rankBin, median) Page 1 Page 2 43 39
In summary, two very different distributions from two Moz API variables. But both showed differences in their scores between SERP pages and provide you with tangible values (medians) to work with and ultimately advise clients on or apply to your own SEO.
Of course, this is just a small sample and shouldn’t be taken literally. But with free resources from both Google and Moz, you can now see how you can begin to develop analytical capabilities of your own to base your assumptions on rather than accepting the norm. SEO ranking factors change all the time and having your own analytical tools to conduct your own tests and experiments on will help give you credibility and perhaps even a unique insight on something hitherto unknown.
Google provide you with a healthy free quota to obtain search results from. If you need more than the 2,500 rows/month Moz provide for free there are numerous paid-for plans you can purchase. MySQL is a free download and R is also a free package for statistical analysis (and much more).
Go explore!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes